Troubleshooting: sample data

1 You receive an error code

Code Description Possible causes Solution
400

This message appears if:

  • Something is wrong with the provided asset ID
    or
  • The sampling capability is not installed on the Edge site.

The error message will specify the problem.
  • The asset exists but the asset is not a column or table
  • The table has no columns.
  • Something is wrong in the relationship of the column, table or database, like a column asset that was not ingested but manually created and no relationship has been defined.
  • The Catalog JDBC Sampling capability has not been defined for the data source Edge connection.
  • If it concerns a wrong asset, provide a valid column or table asset id.

  • If the sampling capability is missing,
install the Catalog JDBC Sampling capability for the data source.
401

This message appears if you are not authenticated to use the sampling API.

The authentication failed.

Provide valid credentials.

403 This message appears if you lack permission to any of the columns within the requested asset.

You do not have the required permissions. Both View permission and View Samples permission are needed to see sample data for an asset.

Verify the user has the required permissions.
404 This message appears if the asset cannot be found. The asset does not exist. Provide an existing column or table asset id.
503 This message appears if the Edge service gets a timeout or fails. The Edge service is not available. Verify that the Edge site is still online and healthy. If not, check the Edge logs to get a better understanding of the issue. If the problem persists, contact Collibra Support for assistance.

2 You receive error message: There is no matching sampling capability found

Issue: You receive the following error message: There is no matching sampling capability found for connection [connection_id].
Reason: This message appears when you open a Column or Table asset page for a data source that has been registered via Edge but for which the Edge site doesn't have an associated Edge capability for sampling.
Solution: To solve the issue, install the Catalog JDBC Sampling capability for the data source. The message provides the id of the Edge connection linked to the data source.

3 No sample data is displayed

There are many conditions that can result in no sample data being displayed. Before reporting an issue, check the following:

Cause Description Solution
The setting Maximum number of samples is set to 0. The sampling feature is disabled and no samples are displayed.

Set the Data Profiling setting Maximum number of samples to a value higher than 0.

See Configuring the use of sample data
The sampling capability is missing for your Edge data source. Samples can only be extracted if the sampling capability is set for the data source on the corresponding Edge site. Install the Catalog JDBC Samplingcapability for the data source.
The asset for which you want to collect sample data has no data. There is no data to show for the asset.  

No sample data is stored in the Collibra cloud repository.
(not applicable for data sources registered via Edge)

  • For Jobserver data sources, sample data is only available in the Collibra cloud repository if the Store Sample Dataoption was selected during the registration of the data source.
  • For assets created without Jobserver or Edge registration, sample data is only available if they were uploaded to the Collibra cloud repository via the Catalog Profiling REST API.
Configuring the use of sample data

4 You always see old sample data for a data source registered via Edge

Sample data stored in the Collibra cloud repository takes precedence over sample data extraction by Edge. Sample data can be available for an Edge data source in the Collibra cloud repository if this data source was previously connected to Jobserver or if sample data was pushed using the Catalog profiling REST API for the data source.
If you want to remove samples from the Collibra cloud repository, see Delete sample data.
See also Understanding the process to display sample data.

5 Collecting the sample data is very slow

  • It can take some time to read and display the sample data available in the Edge cache.
  • The sample data extraction time via Edge is influenced by multiple factors. For example: table size, number of columns in a table, number of samples to collect, maximum length of samples, and push-down sampling mechanism available for the data source. For more details go to Sample data beta feature limitations and guidelines.

6 Retrieving sample data log files

For data sources registered via Edge, Edge logs are generated when sample data is extracted from the data source and cached on the Edge site. The logs start with this text: "Writing cache samples with the key...".
Looking at the Edge logs within a 2-day period should give information on the sampling activity.

Example 

Writing cache samples with the key 'catalog.sample.6385e23cb1ae443a7786c555108d8bb028d23dee39e76ce3169eaa9cdacb1ed3'
"Cache write sample for table 'Snowflake>SNOWFLAKE_SAMPLE_DATA>TPCDS_SF100TCL>CALL_CENTER'