Data export best practices
Recommendation
There are 3 major ways to export data from Collibra Data Intelligence Cloud each with its own advantages for different use cases or scenarios. Choosing the best method for your purpose will provide the best performance and user experience.
- Core REST API
- Output Module
- Reporting Insights API
- Exporting functionality in the Collibra DGC UI
Impact
Ensures that your exports provide the right data for your purpose with the appropriate performance characteristics, avoiding timeouts and interference with other running processes.
Best Practice Recommendations
- Core REST API – best for instantaneous information requests for small amounts of data. The Core REST API is a collection that provides access to a range of data elements and it is important to pre-determine the specific APIs to call the data you are seeking to export.
- Output Module – The Output Module is a lightweight graph query engine exposed through the public API. It allows different output formats, such as JSON, XML, Excel, and CSV and provides a single API to query most of the Collibra entities, such as assets, communities, domains, and types, using SQL-like filtering capabilities. It is best for extracting data sets typically greater than 50, and up to about 10,000 records per page, with a requirement for repeated, multiple calls during a single day. In creating the Table View Configuration to be passed to the Output Module you should be mindful of the total number of records/assets being requested so as not to create a performance bottleneck. The default value for # of records to be extracted is set to 50, but can be increased as required (use a value of -1 to extract all records for a given filter or View ID – but keep in mind that queries that run on complicated or large amounts of data may be slower than expected. Usually, the best approach is to paginate the results. In cases where the complexity or amount of data is unknown, a timeout can be used to break up the execution.)
- Insights (for cloud customers with a Insights license) – used for strategic reporting purposes, requiring a large data set of 8 key components to be used for analysis and reporting and use cases such as adoption monitoring, and so on.The Reporting Data Layer can retrieve vast amounts of data, representing a snapshot in time, and without jeopardizing Collibra front-end performance. You can then use the Insights widget to show Tableau reports, or any report that can be shown as an iframe, on your Collibra dashboard. It yields 8 data files of the underlying metadata from the Collibra model which can then be used to develop a logical model for any analytic or BI custom solution, including SQL.
-
Exporting functionality in the Collibra DGC UI – best for exporting assets, characteristics and relations into a readable format like Excel or CSV, which can then be updated and imported back into Collibra. If the purpose of the export operation is to get a file to be re-imported afterwards with new and updated data, then the preferred format is Excel because it is easy to update and add new records. If the purpose of the export operation is to get a file to be passed on to another storage system, then the preferred format is CSV because it is smaller than Excel but still readable. The starting point for an export file is to create a view (global or on community/domain level) and include all the characteristics and relations that are of interest in it. This view should be saved and shared with everyone who is using the export/import method for data entry. Note that the exporting functionality works on the selected view so fields that are not in the view, even though they exist on the assets, are not included in the export file, and hierarchy views cannot be selected for export.
Authentication
To access the APIs for all 4 approaches above, authentication must be utilized. Available methods include BasicAuth and JWT. JWT is recommended as the more secure method of accessing the API.
Validation Criteria
If you are encountering export errors, you can check the logs in Collibra Console and/or review whether you are using the appropriate export method as described above. Common errors are also described in the documentation listed below.
Additional Information
For more information see the following resources:
- For the Output module, visit the developer portal or download the Hitchhiker's Guide to the Output Module via the product resources page.
- See the insights data access diagram.
- The Data Maturity Report template may also be useful for exploring and using the 8 output files described in option 3, above
- For advanced users and advanced use cases there are Spring Boot integrations in the Marketplace.
- Learn more about Reporting Insights on AWS or GCP. You can find additional resources on the Marketplace.
- For more information on the exporting functionality in the DGC UI, see the product documentation.