Warning Jobserver and all related Jobserver integrations are end of life starting October, 2024, with the exception of Public Sector customers using GovCloud or on-prem environments.
For information on registering a data source via Edge, go to Registering and synchronizing a data source via Edge.

Refresh the schema of a registered data source

You can refresh a schema of registered data to update the data and the profiling. It can also be useful to do this to change data types to force the profiling to use the correct type.

Tip You can also refresh the schema automatically via a schedule.

Important

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Latest UI Classic UI

Prerequisites

You have a global role with the Catalog global permission, for example, Catalog Author.
You have set up the JDBC driver of your source data, for example MySQL.
You have configured one or more Jobservers in Collibra Console. If there is no available Jobserver, the Register data source actions will be grayed out in the global create menu of Collibra Platform.
If you are using a Collibra Platform environment with an on-premises Jobserver, both must have the same installer version. You can find the installer version of your Collibra Platform environment at the bottom of the sign-in window of its Collibra Console, for example 5.9.2-0
You have a resource role with the following resource permissions on the Schema community:
- Asset > add
- Attribute > add
- Domain > add
- Attachment > add
You have the permissions to retrieve the metadata of the following database components through the JDBC Driver Database Metadata methods:
- Schemas
- Tables
- Columns
- Primary keys
- Foreign keys

Note

For the list of supported databases and versions, see Databases supported versions.
For the JDBC connection details of the various databases, see JDBC connection details.

Steps

Tip

The steps vary depending on your data source type and authentication method.

Credentials

CyberArk

Kerberos

NTLM

Open the Schema asset.
1. On the main toolbar, click → Catalog.
  The Catalog homepage opens.
2. In the submenu, click Data Dictionary and select the All Schemas view.
3. Click the schema that you want to refresh.
Tip You can also use the Collibra Platform search function to look up your schema.
In the view bar, to the right, click Actions → Refresh.
The Refresh Schema dialog box appears.
Tip If Catalog experience is disabled, the More menu is shown instead of Actions.

Enter the required information.

Field	Description
Upload a file	Upload or drop the CSV Excel file in the upload box.
Process on	The jobserver used for ingesting.
Column Separator	The character that is used as separator in the data source.
Quote	The character that is used as quote in the data source.
Escape Character	Select the character that is used as escape character in the data source.
Store Data Profile	Option to data profiling profiling on the registered data.
Detect advanced data types	Option to detect advanced data types in the data source.
Store Sample Data	Option to extract sample data from the registered data.

Enter the required information.

Field	Description
Upload a file	Upload or drop the CSV Excel file in the upload box.
Process on	The jobserver used for ingesting.
Store Data Profile	Option to data profiling profiling on the registered data.
Store Sample Data	Option to extract sample data from the registered data.

Enter the required information.

Option Description

Data Source Type

The data source types for which a JDBC driver is available.

Note If you want to use a Collibra provided driver, select Collibra driver.

JDBC Driver Version

The JDBC driver to connect to your database.

Connect via

Process On

The jobserver used for ingesting.

The connection properties as defined in your JDBC driver.

If you want to use , you need the following connection properties.

Label	Description
Keystore file	The name of the keystore file. The keystore must contain the client key and client certificate or certificate chain. If `defaultTruststore` is set to `false`, the keystore has to contain the trusted CA certificate needed to validate the server certificate offered by CyberArk. The value must have the following format: `file://<keystore-file name.jks>`. Example `file://cyberark-keystore.jks`
Keystore password	The password required to open the keystore.
Default truststore	The indication of the default truststore. The default value is set to `False`. `False`: The certificate is validated through the keystoreFile property. `True`: The certificate is validated through the default truststore from the Java JRE. This is recommended when CyberArk is set up to offer a server certificate that can be validated by a public CA (certification authority).
CyberArk address	The host and port number through which the CyberArk server is accessible. The format of the address is `hostname:port`. Example `my.cyberark.com:5502`
CyberArk application ID	The application ID as defined in CyberArk. This ID should be provided by your network or system administrator.
CyberArk query	The CyberArk query. This query should be provided by your network or system administrator.

If you want to use Kerberos authentication, you also need the following connection properties.

Label	Description
Principal	The Kerberos principal identity.
Kerberos realm	The Kerberos realm name.
Login context name	The login context name that is used as the index to the configuration.
Jaas file name	The name of the Jaas file.
Kerberos configuration file	The configuration file containing specific properties for Kerberos authentication.

If you want to use NTLM authentication, you also need the following connection properties.

Label	Description
Security	The security that enables the authentication
Authentication scheme	The used authentication scheme, which is NTLM.

Store credentials

Select this option to store the credentials to access the database. With a schema refresh, you can clear this option again.

Username

Username to access the database.

Password Corresponding password to access the database.

Schedule Data Refresh

Enable or disable a schedule to automatically refresh the data registration.

Cron pattern

Cron Expression

Schedule of the data refresh as a Quartz Cron pattern.

Warning If you create an invalid cron pattern, Collibra stops responding.

Time Zone The time zone of the database.

Store Data Profile

Option to data profiling profiling on the registered data.

Detect Advanced Data Types Option to detect advanced data types in the data source.

Store Sample Data

Option to extract sample data from the registered data.

Enter the required information.

Option Description

Data source type

The data source types for which a JDBC driver is available.

Note If you want to use a Collibra provided drivers, select Collibra driver.

JDBC driver version

The JDBC driver to connect to your database.

Connect via

The jobserver used for ingesting.

The connection properties as defined in your JDBC driver.

Label	Description
Hostname	The name of your device.
Port	The port number.
Database	The name of your database.
Schema	The name of your schema.

Label	Description
URL (hostname:port)	Address of the used database. Use the format hostname:port.
Principal	The Kerberos principal identity.
Schema	The name of your schema.

Label	Description
URL (hostname:port)	Address of the used database. Use the format hostname:port.
Schema	The name of your schema.

Label	Description
Hostname	The name of your device.
Port	The port number.
Database	The name of your database.
Schema	The name of your schema.

Label	Description
Hostname	The name of your device.
Port	The port number.
Database	The name of your database.
Schema	The name of your schema.

Label	Description
URL (hostname:port)	Address of the used database. Use the format hostname:port.
Schema	The name of your schema.

Label	Description
Hostname	The name of your device.
Port	The port number.
Database	The name of your database.
Schema	The name of your schema.

Label	Description
Hostname	The name of your device.
Port	The port number.
Database	The name of your database.

Label	Description
Hostname	The name of your device.
Port	The port number.
SID	The Oracle System ID, which identifies a database on a system.
Schema	The name of your schema.

Label	Description
Hostname	The name of your device.
Port	The port number.
Database	The name of your database.
Schema	The name of your schema.

Label	Description
Hostname	The name of your device.
Port	The port number.
Database	The name of your database.
Schema	The name of your schema.

If you want to use Kerberos authentication, you also need the following connection properties.

Label	Description
Principal	The Kerberos principal identity.
Kerberos realm	The Kerberos realm name.
Login context name	The login context name that is used as the index to the configuration.
Jaas file name	The name of the Jaas file.
Kerberos configuration file	The configuration file containing specific properties for Kerberos authentication.

If you want to use NTLM authentication, you also need the following connection properties.

Label	Description
Security	The security that enables the authentication
Authentication scheme	The used authentication scheme, which is NTLM.

If you want to use , you need the following connection properties.

Label	Description
Keystore file	The name of the keystore file. The keystore must contain the client key and client certificate or certificate chain. If `defaultTruststore` is set to `false`, the keystore has to contain the trusted CA certificate needed to validate the server certificate offered by CyberArk. The value must have the following format: `file://<keystore-file name.jks>`. Example `file://cyberark-keystore.jks`
Keystore password	The password required to open the keystore.
Default truststore	The indication of the default truststore. The default value is set to `False`. `False`: The certificate is validated through the keystoreFile property. `True`: The certificate is validated through the default truststore from the Java JRE. This is recommended when CyberArk is set up to offer a server certificate that can be validated by a public CA (certification authority).
CyberArk address	The host and port number through which the CyberArk server is accessible. The format of the address is `hostname:port`. Example `my.cyberark.com:5502`
CyberArk application ID	The application ID as defined in CyberArk. This ID should be provided by your network or system administrator.
CyberArk query	The CyberArk query. This query should be provided by your network or system administrator.

Store credentials

Select this option to store the credentials to access the database. With a schema refresh, you can clear this option again.

Username

Username to access the database.

Note This field is ignored if your data source uses CyberarkKerberosNTLM.

Password

Corresponding password to access the database.

Note This field is ignored if your data source uses CyberarkKerberosNTLM.

Schedule data refresh

Enable or disable a schedule to automatically refresh the data registration.

Cron pattern

Schedule of the data refresh as a Quartz Cron pattern.

Warning If you create an invalid cron pattern, Collibra stops responding.

Time zone The time zone of the database.

Store Data Profile

Option to data profiling profiling on the registered data.

Detect advanced data types Option to detect advanced data types in the data source.

Store Sample Data

Option to extract sample data from the registered data.

Tables excluded from registration

Database tables that will not be ingested.

Note

If required, you can exclude multiple tables. To do this, press Enter after typing a value and then type the next.
You can use an asterisk (*) as wildcard to select multiple tables. For example, if you want to exclude the tables that all start with act_, you can enter act_*.
The table names are case sensitive.
You can add or remove tables from this list by refreshing the schema.

Click Save & Refresh.
The refresh of the schema starts, you can follow the refresh job in the list of activities.

What's next?

The representation of the schema is updated: Data Catalog creates, edits and deletes assets as needed.
- This can lead to refresh conflicts. See Resolve schema refresh conflicts.
- If you had deleted assets manually, Data Catalog usually doesn’t create them again if you refresh the schema. However, if the assets are required to represent the schema structure, Data Catalog can create them again.
  Example
  You ingested a schema that contains a table and three columns. In Data Catalog, this is represented by a Schema asset, a Table asset and three Column assets.
  Additionally, the following relations are created between the relevant assets:
  - Schema contains/is part of Table
  - Table contains/is part of Column
  In the actual data source, the columns are physically inside the table. However, in Data Catalog, they are separate assets linked by relations. As a consequence, you can delete the Table asset without deleting the Column assets. If you did that, Data Catalog creates the Table asset again if you refresh the schema, because the Table asset is needed for the relations to the Column assets.
If the data source has new values and you selected the checkboxes to store sample data and data profile information, new sample data is generated and all profiling information is updated.
If you did not select the Store Sample Data checkbox, any previously gathered sample data is removed. If you did not select the Store Data Profile checkbox, any previously gathered data profiling information is removed.
Data types or categorical attributes that you changed manually are not updated when you refresh the schema.
Note If you change the data type back to the original value assigned by the profiler, Data Catalog can update it if you refresh the schema.
If you use this schema of the data source for Tableau stitching, you have to restitch after each schema refresh to make sure that all relations are up to date.