Apache Airflow source configuration
Updated:The Source configuration field in the technical lineage for Apache Airflow capability allows you reduce the amount of data objects to be processed and enhance the performance of CollibraData Lineage.
The value of the Source configuration field must be a valid block of JSON code, for example:
{
"datasources": [
{
"namespace": "s3a://my-dev",
"group": "file_group_my-dev",
"type": "file"
},
{
"namespace": "s3a://my-dev2",
"group": "file_group_my-dev2",
"type": "file"
},
{
"namespace": "snowflake://myorg.snowflake.com",
"type": "database",
"collibraSystemName": "snowflake",
"database": "snowflake_db",
"schema": "snowflake_db",
"dialect": "snowflake"
}
]
}
The following table describes the various properties you can use in your JSON code block.
|
Property |
Description | Required? |
|---|---|---|
| datasources |
An array of mappings for Airflow namespaces to Collibra system names and databases. This section includes the properties to translate the system name, database, schema, and dialect. |
|
|
namespace |
The namespace that is used by Airflow. The value of this property must match the namespace in the Airflow OpenLineage files. |
|
|
group |
Specifies a group name used to organize files in the namespace. Specify this property only when the |
|
|
type |
The type of data source that this namespace contains. Specify one of the following values:
If you do not specify this property, Collibra Data Lineage derives the value from the JSON schema. If you specify this property, your provided value takes precedence. When this property is set to |
|
| collibraSystemName |
The system or server name of the data source. Use this property with the Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog. The following rules apply when you specify the system name:
|
|
|
database |
The name of the default database that the namespace connection refers to. The following rules apply when you specify the database name:
|
|
| schema |
The name of the default schema, to be used with the namespace connection. The following rules apply when you specify the schema name:
|
|
| dialect |
When no columnLineage is present, Collibra Data Lineage tries to parse any SQL present. Set the dialect to parse SQL properly. See the list of allowed values.
You can enter one of the following values:
|
|