Dataset Manager

Dataset Manager is a repository of all the datasets in your Collibra DQ environment. On this page, you can drill down into individual datasets to review key data points and access their Profile and Dataset Rules pages. You can also filter them based on numerous criteria, apply individual or bulk actions, manage Data Categories and Business Units, and view the Data Class Library. Dataset Manager is a vital tool to help analysts organize and manage the datasets in their organization's Collibra DQ environment.

Important 
When the dataset security setting Require DATASET_ACTIONS role for dataset rule create/edit access is enabled from the security settings page, only users with ROLE_DATASET_ACTIONS can edit, rename, publish, assign data categories and business units, enable integrations, and schedule DQ Jobs from the Dataset Manager.

image of Dataset Manager with numbers that correspond with the numbered rows in the table below

No. Component Description

Filter

Apply one or many filters to refine the list of datasets based on:

Sensitivity
Filter by sensitivity labels associated with datasets in your Collibra Data Quality & Observability environment.
Run Mode
Filter datasets based on whether they are in published or draft status.
Connection Type
Filter datasets by Pushdown or Pullup processing mode.
Attributes
Filter datasets that only contain rules without alerts or alerts without rules.
Source Type
Filter datasets by data source.
Data Class Type
Filter datasets by data class.
Business Units
Filter datasets by business unit.
Data Categories
Filter datasets by data category.
Row Count
Filter datasets by the number of rows they contain.
Column Count
Filter datasets by the number of columns they contain
Integrations
Filter datasets based on whether they are integrated into Collibra Data Intelligence Platform.
# of DQ Scans
Filter datasets by the number of times they have been run as DQ Jobs.

Bulk Actions

Select the checkbox options in the column to the left of the Dataset column, then click Bulk Actions and select an option from the dropdown menu. The available Bulk Actions options include:

Bulk Manage Host
Allows admin users to update the host URL of multiple datasets at once. Select the checkbox options next to the datasets you want to manage, then click Bulk Manage Host from the Bulk Actions dropdown menu. From the Bulk Manage Host dialog, enter the new host URL and click Save.
Bulk Manage Agent
Allows admin users to update the agent of multiple Pullup datasets at once. Select the checkbox options next to the datasets you want to manage, then click Bulk Manage Agent from the Bulk Actions dropdown menu. From the Bulk Manage Agent dialog, select the new agent from the dropdown menu and click Save.
Important Updating the host of multiple datasets at once applies changes to both scheduled and unscheduled DQ Jobs. The connection of any dataset included in the bulk update is mapped to agent you select.
Bulk Manage Spark Settings
Allows admin users to update the Spark settings of multiple Pullup datasets at once. Select the checkbox options next to the datasets you want to manage, then click Bulk Manage Spark Settings from the Bulk Actions dropdown menu. From the Bulk Manage Spark Settings dialog, fill out the required fields and click Save.
Bulk Manage Business Units
Allows you to apply a business unit to multiple datasets at once. Select the checkbox options next to the datasets you want to manage, then select Bulk Manage Business Units from the Bulk Actions dropdown menu. From the dropdown menu, select your preferred business unit and click Save.

Note To assign a business unit to a dataset, you need to have dataset access or ROLE_DATASET_ACTIONS.

Bulk Manage Data Categories
Allows you to apply a data category to multiple datasets at once. Select the checkbox options next to the datasets you want to manage, then select Bulk Manage Data Categories from the Bulk Actions dropdown menu. From the dropdown menu, select your preferred data category and click Save.

Note To assign a data category to a dataset, you need to have dataset access, ROLE_DATASET_TRAIN, or ROLE_DATASET_ACTIONS.

Bulk Delete
Allows you to delete multiple datasets at once. Select the checkbox options next to the datasets you want to delete, then select Bulk Delete from the Bulk Actions dropdown menu.

Note To delete a dataset, you need to be its owner or have ROLE_ADMIN or ROLE_DATASET_MANAGER.

Bulk Enable Integrations
Allows you to update the host URL of multiple datasets at once. Select the checkbox options next to the datasets you want to manage, click Bulk Enable Integrations from the Bulk Actions dropdown menu, then click Enable integrations.

Note To toggle a dataset integration, you need to have ROLE_ADMIN, ROLE_DATASET_MANAGER, ROLE_DATASET_ACTIONS.

Dataset table

This table lists all datasets in your Collibra DQ environment and provides the high-level data points described in the following table.

Column Description
checkbox option Click these options to select one or more datasets to allow you to apply bulk actions. You can also select checkbox option in the column header to select all datasets.
Dataset

The name of your dataset. You can also see the following in this column:

  • When a dataset is in draft status, a grey "D" for draft dataset displays beneath the dataset name.
  • When a dataset is in published status, a green "P" for published dataset displays beneath the dataset name.
  • When a dataset does not have alerts configured, a grey "A" for no alerts configured displays beneath the dataset name.
  • When a dataset has alerts configured, a red "A" for alerts configured displays beneath the dataset name.
  • When a dataset does not have rules configured, a grey "R" for no rules configured displays beneath the dataset name.
  • When a dataset has rules configured, a green "R" for rules configured displays beneath the dataset name.
  • Click the expand icon to the right of the dataset name to drill down into the dataset.
Business Unit The business unit with which the dataset is associated.
Update Timestamp The timestamp of the last run of the dataset in YYYY-MM-DD hh-mm-ss format.
Source Type The data source of the dataset. For example, SQL Server or BigQuery.
Connection Name The unique name of the data source. For example, EXAMPLE_SQLSERVER_CXN.
Schema/Parent Folder The schema or parent folder from which the dataset originates.
Table/File Name The table or file name from which the dataset was created.
Meta Tags

The meta tags with which the dataset is associated. Optionally click Actions Edit, then enter meta tags in the Meta Tags input fields.

Note Meta tags have a 40 character limit.

Server/File Path The server or file path associated with the origin of your dataset.
Actions

Click actions button to take a variety of actions on your dataset. Available actions include:

  • Edit your dataset.
  • Rename your dataset.
  • Publish your dataset or revert it To Draft.
  • Assign a Data Category to your dataset.
  • Assign a Business Unit to your dataset.
  • Enable or Disable Integrations of your dataset into Collibra Data Intelligence Platform.
  • Delete your dataset.
Note You can also create a clone of the dataset from the Findings page by using the Clone Dataset option on the Job tab.

Manage Data Category

Opens the Data Categories page in the Admin Console where you can create and manage data categories.

Note ROLE_ADMIN or ROLE_DATA_GOVERNANCE_MANAGER is required to access the Admin Console.

Manage Business Units

Opens the Business Units page in the Admin Console where you can create and manage business units.

Note ROLE_ADMIN or ROLE_DATA_GOVERNANCE_MANAGER is required to access the Admin Console.

Data Class Library Opens the Data Class Library modal where you can view and search the data classes in your Collibra DQ environment.

FAQ Opens the Top Questions and Answers modal where a sample of common dataset management questions appear.

Search There are two search options. The first search field on the left lets you search for datasets containing columns that match your search criteria. The second search field on the right lets you search for items that match your criteria based on dataset name, source type, schema/parent folder, or table/file name.

Dataset drill down

Click the expand icon to the right of the dataset name to drill down into the dataset and reveal the following high-level data points.

Field Description
Stats
Daily Rows The number of rows scanned in the last run.
Columns The number of columns in the dataset.
Active Rules The number of active rules included in the last run.
Active Alerts The number of alerts configured to notify specified users when their conditions were met in the last run.
Data Category The data category assigned to your dataset on the Rule Workbench or Metadata Bar.
Quick Links
Profile Click the profile icon to open the Profile page of your dataset.
Rules Click the rules icon to open the Dataset Rules page of your dataset.
Description The description of your dataset given to it on the Explorer Review page.

Column overview

This section of the dataset drill down shows the following column-level details of all columns in your dataset.

Column Description
Field The name of the column.
Type The data type of the column.
Data Class The data class labels of a column, when applied.
Sensitive Label The sensitive label to obscure sensitive information, when applied.

Actions

Click actions button to take a variety of actions on your dataset. Available actions include:

  • Edit your dataset.
  • Note To edit the alias, schema, table or file name, server or file path, or meta tags of a dataset, you need to have ROLE_ADMIN, ROLE_DATA_GOVERNANCE_MANAGER, or ROLE_DATASET_ACTIONS.

  • Rename your dataset.
  • Note To rename a dataset, you need to be its owner or have ROLE_ADMIN or ROLE_DATASET_MANAGER.

  • Publish your dataset or revert it To Draft.
  • Note To publish a dataset, you need to have dataset access, ROLE_DATASET_TRAIN, or ROLE_DATASET_ACTIONS.

  • Assign a Data Category to your dataset.
  • Note To assign a data category to a dataset, you need to have dataset access, ROLE_DATASET_TRAIN, or ROLE_DATASET_ACTIONS.

  • Assign a Business Unit to your dataset.
  • Note To assign a business unit to a dataset, you need to have dataset access or ROLE_DATASET_ACTIONS.

  • Enable or Disable Integrations of your dataset into Collibra Data Intelligence Platform.
  • Note To toggle a dataset integration, you need to have ROLE_ADMIN, ROLE_DATASET_MANAGER, ROLE_DATASET_ACTIONS.

  • Delete your dataset.
  • Note To delete a dataset, you need to be its owner or have ROLE_ADMIN or ROLE_DATASET_MANAGER.