Airflow: Set up OpenLineage integration and prepare files for Cloud Storage connections

Use this procedure to configure your software to emit OpenLineage messages and save the resulting files to a location accessible by Collibra.

  1. To install and configure the OpenLineage integration in Airflow, follow this guideline in the Airflow documentation: Using OpenLineage integration.

    You can use the following configuration as an example:

    [openlineage]
    
    transport='{"type":"http", "url": "http://HOST_OR_URL_WHERE_FLUENTD_IS:8888/openlineage'
    
    namespace = 'airflow'
  2. Copy the files in OpenLineage format to the relevant directory in your cloud-based storage system. The files must be in one of the following:
    • An AWS S3 bucket.
    • An Azure Data Lake Storage container.
    • A Google Cloud Storage bucket.
    Note Whenever you synchronize lineage, you must upload all source files you want to include in the technical lineage graph.

What's next

You can now:

  • Create an AWS connection to an Edge or Collibra Cloud site
  • Create an Azure Data Lake Storage connection to an Edge or Collibra Cloud site
  • Create a Google Cloud Platform connection to an Edge or Collibra Cloud site