Data import best practices
Recommendation
The Import API allows you to add, update or remove actual data (not, typically, metadata) from the Collibra instance. Based on the content of the provided input file, chosen API method and parameters, it determines what should be done specifically for each resource that has been defined.
Impact
Ensures that your imports provide the right data for your purpose with the appropriate performance characteristics, avoiding timeouts and interference with other running processes.
Best Practice Recommendations
The Import API supports the following file formats:
- CSV (preferred for manual, user interface-driven imports)
- Excel
- JSON (preferred for automated, batch imports, typically for more advanced users)
Because of the nature of tabular formats (CSV, Excel), an additional parameter is required to run the import operation on these input file types to indicate how the columns in the file should be understood by the Import API. It allows you to specify, for example, that the second column contains information about the names of the assets that should be imported, the third one refers to the domain that should be used as a target for imported assets, and so on.
JSON, by contrast, avoids the requirement for this parameter as well as maintaining relationships between objects. It offers the optimum combination of file size and flexibility for both scale and performance.
In establishing your imports, be sure to consider whether they are one-time imports or will be scheduled for periodic refreshing so that you can structure them appropriately and leverage the appropriate import API option, for example with synchronization API or the external identifiers.
Authentication
To access the import API for all 3 formats above, authentication must be utilized. Available methods include BasicAuth and JWT. JWT is recommended as the more secure method of accessing the API.
Validation Criteria
You can monitor imports from the Collibra Settings page under Activities, searching for activities with name “import.” For more advanced monitoring you can use the Job API and Collibra Console. Set up your logging following the Logging best practices.
Additional Information
For more information see the following resources:
- See the Collibra Documentation Center for more information on diagnostic files.
- See the Collibra Documentation Center for more information on how to navigate Activities.
- Also, see the Import API documentation in the Developer Portal.
- There are additional API examples and tutorials in our public GitHub repository.