Azure Blob Storage Data Source API
Audience: Data Owners
Content Summary: The
azureblob
endpoint allows you to connect and manage Azure Blob Storage data sources in Immuta.
Note
Additional fields may be included in some responses you receive; however, these attributes are for internal purposes and are therefore undocumented.
Azure Blob Workflow
Create a Data Source
Data source duplicates
In order to avoid two data sources referencing the same table, users can not create duplicate data sources. If you attempt to create a duplicate data source using the API, you will encounter a warning stating "duplicate tables are specified in the payload."
Endpoint
Method | Path | Purpose |
---|---|---|
POST | /azureblob/handler |
Save the provided connection for an Azure Blob Storage data source. |
Query Parameters
None.
Payload Parameters
Attribute | Description | Required |
---|---|---|
Private | boolean When false , the data source will be publicly available in the Immuta UI. |
Yes |
BlobHandler | array[object] A list of full URLs providing the locations of all blob store handlers to use with this data source. |
Yes |
BlobHandlerType | string Describes the type of underlying blob handler that will be used with this data source (e.g., MS SQL ). |
Yes |
RecordFormat | string The data format of blobs in the data source, such as json , xml , html , or jpeg . |
Yes |
Type | string The type of data source: ingested (metadata will exist in Immuta) or queryable (metadata is dynamically queried). |
Yes |
Name | string The name of the data source. It must be unique within the Immuta instance. |
Yes |
SqlTableName | string A string that represents this data source's table in the Query Engine. |
Yes |
Organization | string The organization that owns the data source. |
Yes |
Category | string The category of the data source. |
No |
Description | string The description of the data source. |
No |
Owner | array[object] Users and groups that should be added as owners to this data source. Profiles must be a list of profile IDs and groups must be a list of group IDs: { "profiles": [3, 5], "groups": [4, 1999] } . |
No |
Expert | array[object] Users and groups that should be added as expert users to this data source. Profiles must be a list of profile IDs and groups must be a list of group IDs: { "profiles": [87, 199], "groups": [324] } . |
No |
Ingest | array[object] Users and groups that should be added as ingest users to this data source. Profiles must be a list of profile IDs and groups must be a list of group IDs: { "profiles": [34, 23], "groups": [32] } . |
No |
HasExamples | boolean When true , the data source contains examples. |
No |
Response Parameters
Attribute | Description |
---|---|
Id | integer The handler ID. |
DataSourceId | integer The ID of the data source. |
Warnings | string This message describes issues with the created data source, such as the data source being unhealthy. |
ConnectionString | string The connection string used to connect the data source to Immuta. |
Request Example
The following request saves the provided connection information (in example-payload.json
) as a data source.
curl \
--request POST \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
--data @example-payload.json \
https://your-immuta-url.com/azureblob/handler
Request Payload Example
{
"handler": {
"metadata": {
"tagAttributes": [],
"eventTimeAttribute": "",
"useDirectoryForTags": false,
"sasToken": "?sv=your=sas?token",
"sasTokenUrl": "https://your.blob.example.windows.net/sastoken-url",
"container": "demodata"
}
},
"dataSource": {
"blobHandler": {
"scheme": "https",
"url": ""
},
"expiration": "2021-10-24T03:59:59.999Z",
"blobHandlerType": "Azure Blob Storage",
"recordFormat": "",
"type": "ingested",
"name": "dev",
"sqlTableName": "dev"
}
}
Response Example
{
"id": 18,
"dataSourceId": 18
}
Get Information About a Data Source
Endpoint
Method | Path | Purpose |
---|---|---|
GET | /azureblob/handler/{handlerId} |
Return the handler metadata associated with the provided handler ID. |
Query Parameters
Attribute | Description | Required |
---|---|---|
HandlerId | integer The specific handler ID. |
Yes |
SkipCache | boolean If true , the handler cache will be skipped when retrieving the handler data. |
No |
Response Parameters
Attribute | Description |
---|---|
DataSourceId | integer The data source ID. |
Value | array Details regarding the handler, including container , accountname , sasTokenURL , ingestUserId , tagAttributes , dataSourceName , refreshInterval , eventTimeAttribute , useDirectoryForTags . |
Request Example
The following request returns the handler metadata associated with the provided handler ID.
curl \
--request GET \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
https://your-immuta-url.com/azureblob/handler/67
Response Example
{
"dataSourceId": 427,
"metadata": {
"container": "integration",
"accountName": "integration-tests",
"sasTokenUrl": "https://your.blob.example.windows.net/",
"ingestUserId": "azure blob storage_indexer_example",
"tagAttributes": [],
"dataSourceName": "Test",
"refreshInterval": 0,
"eventTimeAttribute": "",
"useDirectoryForTags": false
},
"type": "azureBlobStorageHandler",
"connectionString": "integration-tests/integration",
"remoteTableDescription": null,
"id": 427,
"createdAt": "2021-09-22T18:45:47.744Z",
"updatedAt": "2021-09-22T18:45:47.969Z"
}
Manage Data Sources
Method | Path | Purpose |
---|---|---|
PUT | /azureblob/handler/{handlerId} |
Update the provided information for an Azure Blob Storage data source. |
PUT | /azureblob/bulk |
Update the handler metadata associated with the provided connection string. |
PUT | /azureblob/handler/{handlerId}/crawl |
Re-crawl the data source and update the metadata. |
Update a Specific Data Source
Endpoint
Method | Path | Purpose |
---|---|---|
PUT | /azureblob/handler/{handlerId} |
Update the provided information for an Azure Blob Storage data source. |
Query Parameters
Attribute | Description | Required |
---|---|---|
HandlerId | integer The specific handler ID. |
Yes |
SkipCache | boolean When true , will skip the handler cache when retrieving metadata. |
No |
Response Parameters
Attribute | Description |
---|---|
Id | integer The ID of the handler. |
DataSourceId | integer The data source ID. |
Metadata | array Details regarding the updated information. |
Request Example
The following request with the payload below updates the metadata for the data source with the handler ID 18
.
curl \
--request PUT \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
--data @example-payload.json \
https://your-immuta-url.com/azureblob/handler/18
Payload Request Example
{
"dataSourceId": 18,
"metadata": {
"container": "testdata",
"accountName": "integration-tests",
"sasTokenUrl": "https://your.blob.example.windows.net/",
"ingestUserId": "azure blob storage_indexer_example",
"tagAttributes": [],
"dataSourceName": "dev",
"refreshInterval": 0,
"eventTimeAttribute": "",
"useDirectoryForTags": false
},
"type": "azureBlobStorageHandler",
"connectionString": "your/testdata",
"remoteTableDescription": null,
"id": 18,
"createdAt": "2021-09-23T18:47:52.976Z",
"updatedAt": "2021-09-23T18:47:53.194Z"
}
Response Example
{
"id": 18,
"dataSourceId": 18,
"metadata": {
"sasToken": "2:your?sastoken==",
"container": "testdata",
"accountName": "your-account-name",
"sasTokenUrl": "2:your?sastokenurlTS",
"ingestAPIKey": "996samplee89c1apia7ckey9",
"ingestUserId": "azure blob storage_indexer_example",
"tagAttributes": [],
"dataSourceName": "dev",
"refreshInterval": 0,
"eventTimeAttribute": "",
"useDirectoryForTags": false
}
}
Update Multiple Data Sources
Endpoint
Method | Path | Purpose |
---|---|---|
PUT | /azureblob/bulk |
Update the data source metadata associated with the provided connection string. |
Query Parameters
None.
Payload Parameters
Attribute | Description | Required |
---|---|---|
Handler | metadata Includes metadata about the handler, such as ssl , port , database , hostname , username , and password . |
Yes |
ConnectionString | string The connection string used to connect to the data sources. |
Yes |
Response Parameters
Attribute | Description |
---|---|
BulkId | string The ID of the bulk data source update. |
ConnectionString | string The connection string shared by the data sources bulk updated. |
JobsCreated | integer The number of jobs that ran to update the data sources; this number corresponds to the number of data sources updated. |
Request Example
The following request updates the autoIngest
value to true
for data sources with the connection string
specified in the payload below.
curl \
--request PUT \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
--data @example-payload.json \
https://your-immuta-url.com/azureblob/bulk
Payload Example
{
"ids": [
5, 6
],
"connectionString": "integration-tests/integration",
"handler": {
"metadata": {
"autoIngest": true
}
}
}
Response Example
{
"bulkId": "bulk_ds_update_dd2600809bf8418dbea2706d6f456636",
"connectionString": "integration-tests/integration",
"jobsCreated": 0
}
Re-crawl the Data Source
Endpoint
Method | Path | Purpose |
---|---|---|
PUT | /azureblob/handler/{handlerId}/crawl |
Re-crawls the data source and updates the metadata. |
Query Parameters
Attribute | Description | Required |
---|---|---|
HandlerId | integer The specific handler ID. |
Yes |
Response Parameters
The response returns a string of characters that identify the job run.
Request Example
The following request re-crawls the data source.
curl \
--request PUT \
--header "Content-Type: application/json" \
--header "Authorization: Bearer dea464c07bd07300095caa8" \
https://your-immuta-url.com/azureblob/hanfler/427/crawl
Response Example
a4de5af0-1be1-11ec-8131-6fe77107bfa9