Airflow: Set up OpenLineage integration for Cloud Storage connections

Use this procedure to configure your software to emit OpenLineage messages and save the resulting files to a location accessible by Collibra.

  1. To install and configure the OpenLineage integration in Airflow, follow this guideline in the Airflow documentation: Using OpenLineage integration.

    You can use the following configuration as an example:

    [openlineage]
    
    transport='{"type":"http", "url": "http://HOST_OR_URL_WHERE_FLUENTD_IS:8888/openlineage'
    
    namespace = 'airflow'
  2. Copy the files in OpenLineage format to the relevant directory in your cloud-based storage system. The files must be in one of the following:
    • An AWS S3 bucket.
    • An Azure Data Lake Storage container.
    • A Google Cloud Storage bucket.
    Note Whenever you synchronize lineage, you must upload all source files you want to include in the technical lineage graph.

What's next

You can now set up Fluentd and prepare the data source files