Technical lineage general troubleshooting

This topic contains the following information:

Most common issues

The following messages or other issues can appear when you run the lineage harvester, view a technical lineage or upload the new relations to Data Catalog via Collibra Data Lineage.

Tip For a list of all error codes and messages that the lineage harvester displays, please see the lineage harvester error codes section.

Problem

Solution

You get the following error message:

Could not find or load main class lineage.lineage-harvester-<version nr.>

This error message appears when the folder path to the lineage harvester is invalid. Check the folder path and make sure that it does not contain whitespaces.

You get the following error message:

Failed to load file '<file-name>'. If the file is not in UTF-8, please convert it accordingly.

This error message appears if the lineage harvester tries to read a non-UTF-8 SQL file of a data source with connection type SqlDirectory. To solve this issue, convert all SQL files to UTF-8 and rerun the lineage harvester.

The lineage harvester does not connect to hosts using a proxy server.

Technical lineage does not support proxy server authentication, but you can connect to a proxy server by means of the following commands.

On Windows

  1. Set the -D parameter to the JAVA_OPTS environment variable.
    Example 
    set JAVA_OPTS=-Dhttps.proxyHost="azusquid.imf.org" -Dhttps.proxyPort="8080"
  2. Run the lineage harvester in the same command line window: .\bin\lineage-harvester.bat

On other operating systems

  1. To access the hosts via a proxy server, run the following command: bin/lineage-harvester -Dhttps.proxyHost=<Hostname or IP address of the proxy> -Dhttps.proxyPort=<port number> full-sync
    Example If you want to use a proxy with hostname proxy.example.com and port number 443, run the following command:
    bin/lineage-harvester -Dhttps.proxyHost=proxy.example.com -Dhttps.proxyPort=443
  2. To exclude hosts that should be accessed without going through the proxy server, add the following parameter: -Dhttp.nonProxyHosts=<host to exclude>.

    You can exclude multiple hosts by using the pipe character (|) to separate the hostnames or IP addresses to exclude. You can also use an asterisk (*) as a wildcard to match multiple hostnames or IP addresses.

    Example If you want to exclude hosts with hostname localhost and hosts with IP address 127.0.0.1 and all IP addresses starting with 192.168*, run the following command:
    bin/lineage-harvester -Dhttps.proxyHost=proxy.example.com -Dhttps.proxyPort=443 -Dhttps.nonProxyHost=localhost|127.0.0.1|192.168*

Important  In your configuration file, the value of the source "url" or "hostname" property (depending on the data source), and the value in your -Dhttps.nonProxyHost parameter, as described above, must both be either an IP address or a host name. You will get an error if, for example, you have a host name in the "hostname" property and an IP address in the -Dhttps.nonProxyHost parameter.

You get the following error message:

Source '<data source name> failed with exception: }}{{javax.net{{.ssl.SSLHandshakeException: General SSLEngine problem}}

This message appears when the proxy server sends an unexpected certificate to the lineage harvester or when the default Java truststore is empty or outdated. To resolve this issue, do the following:

First update Java and rerun the lineage harvester to see if the problem is solved. If the same error message is shown, follow the steps below.

  1. Use the following openssl command to get a certificate from app.sqldep.com, which is part of the Collibra Data Lineage infrastructure:

    openssl x509 -in <(openssl s_client -connect app.sqldep.com:443 -prexit 2>/dev/null) -out sqldep.crt

  2. Add the certificate to the Java truststore:
    keytool -importcert -file sqldep.crt -alias sqldep -keystore <your keystore name> -storepass changeit
  3. Run the lineage harvester and use the new truststore using the following parameter:
    -Djavax.net.ssl.trustStore=<your keystore name>
    Example To synchronize your data sources again, run the following command:
    ./bin/lineage-harvester sync -Djavax.net.ssl.trustStore=mykeystore

You get the following error messages:

In the lineage harvester log file:

java.lang.Exception: No native library found for os.name=Linux, os.arch=x86_64, paths=[/org/sqlite/native/Linux/x86_64:/usr/java/packages/
lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib]

In the console:

Failed to load native library:<sqlite-file-name>. osinfo: Linux/x86_64 java.lang.UnsatisfiedLinkError: /tmp/<sqlite-file-name>: failed to map segment from shared object: Operation not permitted

The lineage harvester uses a temporary file containing an SQLite database as a cache file. That means that you need write permission to the /tmp dir folder.

If this action failed, you can specify another directory with write permissions using -Dorg.sqlite.tmpdir=<path to a temp directory>.

Example You have a temporary directory with write permissions. The path to this directory is custom/temp. Run the lineage harvester with the following command:
./bin/lineage-harvester -Dorg.sqlite.tmpdir=custom/temp full-sync

You get the following error message:

Technical lineage is not enabled for this Catalog instance.

First make sure that there are no spelling errors in the Data Catalog section of the configuration file. If your configuration file is configured correctly, but the issue is not solved, create a support ticket to enable Technical lineage for your Collibra Data Intelligence Cloud instance in Salesforce.

You get the following error message:

The size of the import file is too large (max size: 10 MB).

The file you are trying to upload exceeds the size limit for uploaded files.

Contact Collibra support to increase the maximum file size.

Technical lineage is unavailable because the selected table does not contain columns.

Technical lineage only includes tables that have columns. Add a relation of the type "Table contains/is part of Column" between your Table asset and Column assets.

You get the following message in your technical lineage:

The current asset doesn't have a technical lineage yet.

This message appears if one or more of the following situations apply:

  • The data source of the current asset is not included in the configuration file. If you want a technical lineage for this asset, add its data source to the configuration file.
  • You have upgraded to the lineage harvester 1.3.0 or newer or you created a technical lineage for the first time. In this case, you may need to restart your DGC service before you can see the technical lineage.
  • You see parsing errors. For more information, see the Sources tab page.
  • The full name of one or more relevant assets does not match any of the names of the assets in the configuration file, which causes automatic stitching to fail. Make sure that the information in the configuration file and the Data Catalog physical data layer matches:
    • The relevant assets have relations between each other, for example Technology asset groups/is grouped by Technology asset> → <Database asset> contains/is part of <Schema asset> contains/is part of <Table asset> contains/is part of <Column asset>.
    • The full name of your System asset matches the name of your system or the name you used in the configuration file.
    • The full name of your Database asset matches the name of your database or, for Google BigQuery your project, or the name you used in the configuration file.
    • The full name of your Schema asset matches the name of the Schema of the data source or the name you used in the configuration file.
    Tip Make sure that the full path of each asset in Data Catalog matches the full path of the corresponding data object from your data source on the Stitching tab page.

You see the following message:

Edges count exceeds the limit 1000.

This message appears when the technical lineage graph exceeds the limit of 1000 nodes and is too large to display. This happens, for example, if you have a table with many columns and you try to show the technical lineage of all columns in a table in one graph.

If you see this message, we recommend that you browse through the technical lineage graph on the object level or select a single column in the Browse tab pane.

Note You cannot manually change this limit.

You see the following error message in your technical lineage for a Microsoft SQL Server data source: "Oops, no data flow founds in your SQL scripts. Make sure you upload DML queries like insert, update, merge that moves data between the tables."

This error message appears when you run the lineage harvester to create a technical lineage for a Microsoft SQL Server data source without having the correct permissions to the SQL Server. As a result, the lineage harvester processes empty files and there is no technical lineage available for this data source.

Make sure you have at least the VIEW DEFINITION permission or sysadmin role in Microsoft SQL Server.

Note If you use multiple users, make sure that each one of them has the proper permissions.

The import job fails.

Note If the import job fails during import and the failing job is rolled back, you can have both old and new relations. The old relations were created during the first job and the new relations are created after the rollback. If more than one job is triggered, only the failed job is rolled back.

First, check the following:

  • The asset ID must exist.
  • The structure of the data must be correct.
  • The cardinality of relation types between asset types.

Then, rerun the import of relations.

Relations are not changed as expected.

Check whether the lineage harvester refreshed the data source via a scheduled job. If the import job failed, then the data source was not refreshed and the previously created relations stay the same. If that happened, rerun the lineage harvester to import again.

Manual relations are overwritten.

We recommend that you do not manually add relations of the type "Data Element targets / sources Data Element" between asset types that are imported via the scheduled jobs. These relations are overwritten every time the scheduled job synchronizes the data source.

Ingesting Looker or Power BI assets fails.

For more information, see the following sections:

Testing connectivity

You can check whether the lineage harvester can connect to the Collibra Data Lineage server and Data Catalog.

  1. Run the lineage harvester in command line.
  2. Run the following command: test-connection.
  3. The result shows if the lineage harvester can connect to the Collibra Data Lineage server and Data Catalog.

The logs will also show the IP addresses of the Collibra Data Lineage servers that you have to whitelist.

Password errors

If you mistyped the password or want to change an existing password, go to the lineage harvester folder > config/pwd.conf and delete the lines below. As a result, the lineage harvester will ask for the password again.

Tip If you have the lineage harvester version 1.3.0 or newer, you can also provide your passwords via stdin or a password manager.

{
	"url" : "<URL>",
	"userName" : "<user>",
	"password" : "<password>"
}