About Data Catalog
The overarching aim of Data Catalog is to create and maintain an inventory of an organization’s data assets across its entire digital landscape, so that data assets are easier to find and trust to drive insightful business decisions by data consumers.
In the Catalog application in Collibra, you can integrate metadata from multiple data sources: databases, data lakes, warehouses, enterprise applications, ETL tools, and BI solutions. Metadata provides information such as the format of the data, the structure of the data, and when assets were created. For example, Metadata synchronization results for jdbc data sources.
Once the metadata is integrated, you can enrich the metadata by adding profiling information, defining the data class, showing sample data, linking the meta data to the business context, showing the lineage, data quality, and more.
About metadata, samples, profiling data, classification, lineage, and more in Catalog asset pages
Catalog asset pages can include detailed information about the data they represent. These details include:
-
Metadata: Metadata is the data about data that is ingested in Collibra by registering or integrating a data source. The way to integrate metadata depends on your data source, infrastructure, and required outcome.
-
Profiling: Profiling data provides a statistical summary of the data and includes the data type in the data source.
-
Sample data: Sample data is a set of randomly collected data from the data source.
-
Classification: Classification data provides shows the data class to which an asset has been assigned via the classification process. Knowing the data classification helps give context to your data.
-
Diagrams: Diagrams are also called business lineages, traceability diagrams, or summary lineages. They provide a summary view that traces data views from data source to points of use, for example, a business report. Diagrams are useful for tracking the flow of data in Collibra showing links and dependencies.
-
Technical lineage: Technical lineages provides a detailed view showing the aggregation, manipulation, and transformation of data through ETL tools, files, and ad-hoc SQL.
Data Catalog submenu pages
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
The following table describes each of the submenu items of the Catalog application.
Page | Description |
---|---|
Overview Catalog Home |
The landing page for Catalog. This page is designed to help you quickly and easily find Data Catalog-related assets. |
All report assets. |
|
Data Sets | All Data Set assets shown as a set of tiles or as a table, with their name, description and, if there are any, connections to existing assets in Collibra. |
Data Sources | Data sources that are used for data source registrations. |
Data Dictionary | All data assets in Collibra. |
Technology Assets | All technology assets in Collibra. |
Metrics |
A variety of statistics related to how the assets are used. |
Access Requests | The history of your access requests and their status. |
Advanced Data Types | All advanced data types, which are used during a data source registration. |
Integrations | Allows you to register a data source. This page contains two tabs. The Data Source Registration tab allows you to create a Database or File System asset from which you can start the synchronization of a data source. Use this tab for JDBC, S3, GCS, and ADLS integrations. The Integration Configuration tab allows you to configure all other Metadata, ETL, and BI Integrations and start the synchronization. For example, Synchronize Databricks Unity Catalog, or Create a technical lineage via Edge. |
Helpful resources
Courses on Collibra University:
- Navigating catalog assets for data consumers
- Register a data source: Bring your metadata over from Google Cloud Storage
- Register a data source: Bring your metadata over from Databricks Unity Catalog
- Register a data source: Bring your metadata over from Azure Data Lake Storage
- Register a data source: Bring your metadata over from Snowflake
- Register a data source: Bring your metadata over from Amazon Redshift