Prepare AWS Glue <source ID> configuration file

The lineage harvester uses the lineage harvester configuration file to collect the AWS Glue script annotations and sends them to the Collibra Data Lineage server. However, if the useCollibraSystemName in the lineage harvester configuration file is set to true or if the lineage harvester cannot determine the database name, default schema name or dialect, you also have to provide a specific <source ID> configuration file that defines the connection information. For example system name of databases in AWS Glue.

Collibra Data Lineage uses the system names to match the structure of databases in AWS Glue to the structure of assets in Data Catalog.

Tip The name <source ID> configuration file refers to the value of the Id property in the lineage harvester configuration file.

Prerequisites

Steps

  1. Create a new JSON file in the lineage harvesterconfig folder.
  2. Give the JSON file the same name as the value of the Id property in the lineage harvester configuration file.
    Example The value of the Id property in the lineage harvester configuration file is AWS-Glue-source-1. As a result, the name of your JSON file should be AWS-Glue-source-1.conf.
  3. For each database in the AWS Glue script annotations, add the following content to the JSON file:

    Property

    Description

    Required?

    connection_name=<name> or found_dbname=<database-name>;found_hostname=<host-name>

    The name of a connection in AWS Glue.

    This property contains the translation key, which is either a connection name, for example"connection_name=my-connection or the combination of the database name and hostname, for example, found_dbname=my-database-name;found_hostname=thisserver.onmicrosoft.com.

    Yes

    dbname
    The name of the database of a supported data source in AWS Glue.

    No

    schema

    The name of the default schema of a supported data source in AWS Glue.

    If the lineage harvester fails to find a specific schema, it uses the default schema.

    No

    dialect

    The dialect of the supported data source in AWS Glue.

    No

    collibraSystemName

    The system or server name of a database.

    Yes, if useCollibraSystemNameis set to true.

  4. Save the <source ID> configuration file.