Architecture for creating technical lineage for OpenLineage

The following diagram shows the architecture of technical lineage for OpenLineage integration.

To create this technical lineage, we recommend using Fluentd. Fluentd is a third-party, open-source tool maintained by the community. While we provide documentation to help configure the integration, Collibra support is limited to Collibra-side configuration and does not cover troubleshooting the Fluentd environment.

  1. The code moves and transforms data from a bucket, such as AWS, to a database. You already have this part.

    This integration assumes that your code emits information in OpenLineage format when a job runs. This architecture is event driven.

  2. The second part collects the emitted OpenLineage messages. You can use any method to collect the OpenLineage messages and save them to files. For example:
    1. You can use Fluentd for receiving REST API calls and saving them to files. This is the recommended and supported method. Fluentd is an open source data collector for building a unified logging layer. Once installed on a server, it runs in the background to collect, parse, transform, analyze, and store various types of data.
    2. You can save the OpenLineage messages directly to disk.
  3. For Collibra Data Lineage to process the saved files, ensure that Collibra Data Lineage has access to the files.
    • If you use technical lineage via Edge, copy the files to the Edge server, and then use the Edge CLI to copy the files to the proper location.
    • If you use the lineage harvester, the files must be local to the lineage harvester. You can choose to run the lineage harvester from the same server as Fluentd, or copy the files from the Fluentd server to the server where the lineage harvester runs.

    When the source files are ready, Collibra Data Lineage parses the lineage, merges it with any other lineage information in Collibra Data Lineage service, stitches the technical lineage objects to Data Catalog assets, and generates the technical lineage graph.