Prepare AWS Glue <source ID> configuration file
The lineage harvester uses the lineage harvester configuration file to collect the AWS Glue script annotations and sends them to the Collibra Data Lineage server. However, if the useCollibraSystemName in the lineage harvester configuration file is set to true or if the lineage harvester cannot determine the database name, default schema name or dialect, you also have to provide a specific <source ID> configuration file that defines the connection information. For example system name of databases in AWS Glue.
Collibra Data Lineage uses the system names to match the structure of databases in AWS Glue to the structure of assets in Data Catalog.
Tip The name <source ID> configuration file refers to the value of the Id property in the lineage harvester configuration file.
Prerequisites
-
The
useCollibraSystemNamein the lineage harvester configuration file is set totrue.
Steps
- Create a new JSON file in the lineage harvesterconfig folder.
- Give the JSON file the same name as the value of the
Idproperty in the lineage harvester configuration file.Example The value of theIdproperty in the lineage harvester configuration file isAWS-Glue-source-1. As a result, the name of your JSON file should be AWS-Glue-source-1.conf. - For each database in the AWS Glue script annotations, add the following content to the JSON file:
Property
Description Required?
connection_name=<name> or found_dbname=<database-name>;found_hostname=<host-name>The name of a connection in AWS Glue.
This property contains the translation key, which is either a connection name, for example"
connection_name=my-connectionor the combination of the database name and hostname, for example,found_dbname=my-database-name;found_hostname=thisserver.onmicrosoft.com.Yes
dbnameThe name of the database of a supported data source in AWS Glue. No
schemaThe name of the default schema of a supported data source in AWS Glue.
If the lineage harvester fails to find a specific schema, it uses the default schema.
No
dialectThe dialect of the supported data source in AWS Glue.
No
collibraSystemNameThe system or server name of a database.
Yes, if
useCollibraSystemNameis set totrue.See an example.
{ "connection_name=mssql-database-connection": { "dbname": "mssql-database-name", "schema": "mssql-schema-name", "dialect": "mssql" "collibraSystemName": "mssql-system-name" }, "found_dbname=oracle-db;found_hostname=thisserver.onmicrosoft.com": { "dbname": "oracle-database-name", "schema": "oracle-schema-name", "dialect": "oracle" "collibraSystemName": "oracle-system-name" } }Tip Click
to copy the example to your clipboard. - Save the <source ID> configuration file.