Data import best practices

Recommendation

The Import API allows you to add, update or remove actual data (not, typically, metadata) from the Collibra instance. Based on the content of the provided input file, chosen API method and parameters, it determines what should be done specifically for each resource that has been defined.

Impact

Ensures that your imports provide the right data for your purpose with the appropriate performance characteristics, avoiding timeouts and interference with other running processes.

Best Practice Recommendations

The Import API supports the following file formats:

  • CSV (preferred for manual, user interface-driven imports)
  • Excel
  • JSON (preferred for automated, batch imports, typically for more advanced users)

Because of the nature of tabular formats (CSV, Excel), an additional parameter is required to run the import operation on these input file types to indicate how the columns in the file should be understood by the Import API. It allows you to specify, for example, that the second column contains information about the names of the assets that should be imported, the third one refers to the domain that should be used as a target for imported assets, and so on.

JSON, by contrast, avoids the requirement for this parameter as well as maintaining relationships between objects. It offers the optimum combination of file size and flexibility for both scale and performance.

In establishing your imports, be sure to consider whether they are one-time imports or will be scheduled for periodic refreshing so that you can structure them appropriately and leverage the appropriate import API option, for example with synchronization API or the external identifiers.

Authentication

To access the import API for all 3 formats above, authentication must be utilized. Available methods include BasicAuth and JWT. JWT is recommended as the more secure method of accessing the API.

Validation Criteria

You can monitor imports from the Collibra Settings page under Activities, searching for activities with name “import.” For more advanced monitoring you can use the Job API and Collibra Console. Set up your logging following the Logging best practices.

Additional Information

For more information see the following resources: