Configure the profiling and classification options via Edge
Through the profiling and classification options, you can determine:
- whether you want to start the profiling and classification process automatically after each synchronization.
- the default profiling behavior for the schemas, such as whether the profiling is based on all data or on a random subset of the data.
- whether specific schemas do not use the default behavior but instead have their own behavior .
- which schemas you want to profile and classify.
Before you begin
- You have created and installed an Edge site.
- Your Edge site has a JDBC profiling capability.
- You have enabled data source registration via Edge.
- You have enabled profiling and classification via Edge.
- You have registered a data source via Edge.
- You have synchronized one or more schemas of the registered database.
Required permissions
-
Your Edge site has a global role with the following global permissions: Data Catalog and Register Profiling Information.
Steps
- Open a Database asset page.
-
In the tab pane, click
Configuration. - Click the Profiling and Classification tab.
The Profiling and classification options open.Tip Only the synchronized schemas are available in the list.
- In the Default Profiling and Classification Rule section, click Edit.
- Enter the required information.
Option Description Automatically run when a metadata extraction is synchronized Enable to automatically create a data profile and classify columns every time the synchronization process of one or more schemas finishes.
This may take a long time. You can also add a schedule to profile and classify at regular intervals.
Select Rows to Profile Do not Profile and Classify (unless specified in the schema-specific rule) Select if you don't want to define a default profiling behavior for the schemas.
Important Use this option if you only want to profile and classify some of the schemas.
If you select this option, Collibra only profiles and classifies the schemas for which a specific profiling and classification rule has been defined.All Rows Select to, by default, profile the schemas based on all data. This is also called full scan. Random Rows Select to, by default, profile schemas based on a subset of the data. This is also called partial scan.
If you select this option, the Maximum number of rows field becomes available. You can enter the maximum number of rows that you want to use for profiling. By default, the maximum number of rows is 20 000.Note- The value must be between 100 and 1 000 000. Your data source creates the set of data to profile from that amount of rows.
- If you typed a value that is bigger than the amount of rows in the data source, the entire data source is used to profile the data.
Warning Only some data sources support the use of random rows. To verify if your data source allows it, go to Collibra-provided JDBC drivers.
For data sources that support the use of random rows, the Random Rows option is selected by default. For data sources that don't support it, the Do not Profile and Classify (unless specified in the schema-specific rule) option is selected by default.
- Click Save.
- If you want to define a specific profiling and classification rule for a schema:
- In the Schema Profiling and Classification Rules section, select the schema.
The schema-specific information opens. - Do one of the following:
- To create a new rule, click Add Rule.
- To edit an existing rule, click Edit .
- Enter the required information.
Option Description Do not Profile and Classify Select to indicate you do not want to profile and classify this schema.
This option is useful if you want to exclude a schema from the profiling and classification process.All Rows Select to profile the schema based on all data. This is also called full scan. Random Rows Select to profile the schema based on a subset of the data. This is also called partial scan.
If you select this option, the Maximum Number of Rows field appears. Enter the maximum number of rows you want to use for profiling and classification. By default, the maximum number of rows is 20,000.Note- The value must be between 100 and 1,000,000. Your data source creates the set of data to profile from that amount of rows.
- If you typed a value that is bigger than the amount of rows in the data source, the entire data source is used to profile the data.
Warning Only some data sources support the use of random rows. To verify if your data source allows it, go to Collibra-provided JDBC drivers.
For data sources that support the use of random rows, the Random Rows option is selected by default. For data sources that don't support it, the Do not Profile and Classify option is selected by default.
- Click Save.
- In the Schema Profiling and Classification Rules section, select the schema.
What's next?
You can now profile and classify the data manually, automatically or add a schedule.