Lineage harvesting app command options and arguments
After creating a configuration file, you can use the lineage harvester to perform specific actions with the data sources that are defined in your configuration file.
Tip If you run the lineage harvester in command line, you will see an overview of possible command options and arguments that you can use.
Typical command options and arguments
The following table shows the most commonly used command options and arguments. You can see a full list of commands by entering the --help
command in the command line. Note that commands that are not listed in this table are intended for internal use.
Command | Description |
---|---|
full-sync
|
Uploads all of the metadata from the data sources mentioned in your configuration file to the Collibra Data Lineage service, where the metadata is then processed and uploaded to Data Catalog. After you enter this command, the lineage harvester starts synchronization processing and displays the total number of data sources that are being ingested. Synchronization processing ends with an error in the following situations:
|
-s "<ID of data source>"
|
Uploads only the metadata from a specified data source. For example, This command allows you to process data from a newly added data source or to refresh a data source in the configuration file, without refreshing the other data sources. This reduces the time you need to upload your data sources, since you only upload specific ones without affecting the others. If you want to process multiple data sources, add You can concurrently synchronize as many sources as you like. On the Collibra Data Lineage service instance, the sources are processed sequentially. Note You can use this argument multiple times to include multiple data sources. |
--no-matching
|
Uploads a technical lineage without stitching the data objects in your technical lineage to the corresponding Column and Table assets in Data Catalog. Note As a result, you won't see the technical lineage of a specific Table or Column asset, but you can still see and browse the full technical lineage. |
sync
|
Whereas After you enter this command, the lineage harvester starts synchronization processing and displays the total number of data sources that are being ingested. Synchronization processing ends with an error in the following situations:
Tip See the following example for advice on how to use the Example Let's say you've run
|
-s "<ID of data source>"
|
Syncs only the metadata on the Collibra Data Lineage service, from a specified data source. For example, This command allows you to sync data from one data source without refreshing the other data sources. You must have previously uploaded the metadata to the Collibra Data Lineage service. Warning Only the sources you specify are synced. This means that any previously ingested metadata from non-specified sources, in Data Catalog, is deleted, along with its existing technical lineage. If this is not your intention, consider using Note You can use this argument multiple times to include multiple data sources. |
|
Analyzes a specified batch (ZIP file) of metadata on the Collibra Data Lineage service instance. The Sources tab page shows the transformation details or source code that was analyzed and the results of the analysis. |
|
Downloads all your data sources in a separate ZIP file, per data source, to the lineage harvester output folder. |
-s <ID of data source>
|
Downloads only the data source with a specific ID. For example, Note You can use this argument multiple times to include multiple data sources. |
|
Lists all of the data sources that will be used to create a technical lineage. When you enter this command, up to 500 data sources are listed per page by default. The list includes the following details for each data source:
Example
Source ID 1redshift (from edge: false) (useSystemName: false) indicates that the data source with the 1redshift source ID was ingested by using the lineage harvester, and the system name of the data source is not used to match the System asset in Data Catalog. |
-p <page number>
|
Specifies the page to be displayed. The value of For example, if you enter Note To use the
-p , -s , and -all options, you must have the lineage harvester version 2023.05 or newer. |
-s <number of data sources>
|
Specifies the number of data sources to be listed on one page. The value of For example, if you enter If you enter Note To use the
-p , -s , and -all options, you must have the lineage harvester version 2023.05 or newer. |
-all
|
Lists all data sources. The data sources are not formatted in pages. If you enter this option with the For example, if you enter Note To use the
-p , -s , and -all options, you must have the lineage harvester version 2023.05 or newer. |
|
Ignores the specified data source from the list of data sources that will be used to create the technical lineage, where You can specify only one source ID at a time. If your source ID includes spaces, enclose the source ID in double or single quotation marks, for example You can use this command to delete the technical lineage of a data source by using the lineage harvester. For details, go to Delete the technical lineage of a data source if you use the lineage harvester and Delete the technical lineage of a data source on Edge for technical lineage via Edge. Note To use the
ignore-source command, you must have the lineage harvester version 2023.04 or newer. |
|
Provides passwords of your Collibra Data Intelligence Platform instance and the data sources in your configuration file to the lineage harvester without storing the passwords in the lineage harvester folder. You can replace |
|
Checks the connectivity to the Collibra Data Lineage service instance and to Data Catalog. The logs will also show the IP addresses of the Collibra Data Lineage service instances that you have to allow. This command is mostly used for troubleshooting purposes. |
|
Shows an overview of all supported command options and arguments that you can use in the lineage harvester. |
|
Shows the version of the lineage harvester that you are using. |
-Dlineage-harvester.log.dir=path/to/log/dir
|
Determine the path of the log file. |