Create a Dictionary Identifier
Note
In previous documentation, identifier is referred to as classifier. The language is being updated to identifier to be more accurate and not conflate meaning with the Immuta data classification and frameworks feature.
Use case: Custom dictionary identifier
Scenario: You have data that includes the names of the rooms employees' desks are in across your organization. Although these locations may be considered sensitive in particular datasets, they would not be detected by Immuta's built-in identifiers.
A custom dictionary identifier allows you to create your own detectors that enable Immuta's sensitive data discovery to match a list of room names to values in the dataset. The tutorial below uses this scenario to illustrate creating this identifier.
Attributes of the custom dictionary identifier
Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom dictionary identifier are outlined in the table below.
Attribute | Description |
---|---|
name | string Unique, request-friendly identifier name. |
displayName | string Unique, human-readable identifier name. |
description | string The identifier description. |
type | string The type of identifier: dictionary . |
config | object Includes config.minConfidence , config.tags , config.values , and config.caseSensitive (defaults to false ). *See descriptions below. |
minConfidence* | number When the detection confidence is at least this percentage, tags are applied. |
tags* | array[string] The name of the tags to apply to the data source. Note: All tags must start with Discovered. . |
values* | array[string] The list of words to include in the dictionary. |
caseSensitive* | boolean Indicates whether or not values are case sensitive. Defaults to false . |
Create a custom dictionary identifier
-
Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.
-
Save the custom dictionary identifier payload in a .json file. The dictionary below contains the words
Research Lab
,Blue Room
, andPurple Room
.{ "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER", "displayName": "Employee Desk Location Identifier", "description": "This identifier detects when an employee's desk location appears in a dataset.", "type": "dictionary", "config": { "values": ["Research Lab", "Blue Room", "Purple Room"], "caseSensitive": false, "minConfidence": 0.6, "tags": ["Discovered.desk-location"] } }
-
Create the identifier using one of these methods:
Immuta CLI
immuta api sdd/classifier -X POST --input ./example-payload.json
HTTP API
curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: 12345678900000" \ --data @example-payload.json \ https://your-immuta-url.immuta.com/sdd/classifier
-
If the request is successful, you will receive a response that contains details about the identifier.
{ "createdBy": { "id": 1, "name": "John", "email": "john@example.com" }, "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER", "displayName": "Employee Desk Location Identifier", "description": "This identifier detects when an employee's desk location appears in a dataset.", "type": "dictionary", "config": { "tags": [ "Discovered.desk-location" ], "values": [ "Research Lab", "Blue Room", "Purple Room" ], "caseSensitive": false, "minConfidence": 0.6 }, "id": 68, "createdAt": "2021-10-20T17:57:51.696Z", "updatedAt": "2021-10-20T17:57:51.696Z" }
What's next
Continue to one of the following tutorials:
- Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
- Create a template: Although only data governors can create identifiers, data owners
can add identifiers to templates, which they then apply to their data sources to override
minConfidence
or tags for identifiers within the template.