Prepare the lineage harvester configuration file for Tableau
You have to prepare a configuration file before you run the lineage harvester. The lineage harvester collects your Tableau metadata and sends it to the Collibra Data Lineage server, where it is processed and analyzed. Collibra Data Intelligence Cloud then imports the Tableau assets and relations to Data Catalog.
Prerequisites
- You have Collibra Data Intelligence Cloud 2022.01 or newer.Warning If you are using Collibra Data Intelligence Cloud 2021.11 or older, you have to add all Tableau attributes in the operating model to a scope and create a scoped assignment before you ingest Tableau via the lineage harvester. For complete information and step-by-step instruction, see Tableau general troubleshooting.
- You have the lineage harvester 2022.02 or newer.
- You have a global role that has the Manage all resources global permission.
- You have a global role with the Catalog global permission, for example Catalog Author.
- You have a global role with the Technical lineage global permission.
- You have a global role with the Data Stewardship Manager global permission.
- You have created a BI Data Catalog domain in which you want to ingest the Tableau assets.
-
You have a resource role with the following resource permission on the community level in which you created the BI Data Catalog domain:
- Asset: add
- Attribute: add
- Domain: add
- Attachment: add
- You have downloaded the lineage harvester and you have the necessary system requirements to run it.
- You have tested your connectivity with the Tableau server.
Steps
- Run the following command line to start the lineage harvester:
- Windows:
.\bin\lineage-harvester.bat
- For other operating systems:
chmod +x bin/lineage-harvesterand thenbin/lineage-harvester
An empty configuration file is created in the config folder.
- Windows:
-
Open the lineage-harvester.conf file and enter the values for each property.
Properties Description general This section describes the connection information between the lineage harvester and Data Catalog.
catalogThis section contains information that is necessary to connect to Data Catalog.
urlThe URL of your Collibra Data Intelligence Cloud environment.
Note You can only enter the public URL of your Collibra DGC environment. Other URLs will not be accepted.
usernameThe username that you use to sign in to Collibra.
useCollibraSystemName
Indication whether you want to use the system or server name of a data source to match to the System asset you created when you prepared the physical data layer. This is useful when you have multiple databases with the same name.
By default, the
useCollibraSystemNameproperty is set tofalse. If you want to use it, set it totrue.- If you keep the property set to
false, the lineage harvester ignores thecollibraSystemNameproperty in the rest of the configuration file. - If you set the
useCollibraSystemNameproperty totrue, the lineage harvester reads the value in thecollibraSystemNameproperty in all sections of the configuration file and in the Tableau <source ID> configuration file.Note If you set the useCollibraSystemName property to "true" in your lineage harvester configuration file, but don't define the system name in the Tableau <source ID> configuration file, the system name in the Tableau technical lineage shows DEFAULT as the system name.
Warning Unless you have multiple databases with the same name, we highly recommend that you keep the default value.
sources This section contains all Tableau connection properties.
typeThe kind of data source. In this case, the value has to be Tableau.
idThe unique ID to identify the Tableau metadata that was uploaded to the Collibra Data Lineage.
Tip This value can be anything as long as it is a unique. The lineage harvester uses the ID to identify a batch of data on the Collibra Data Lineage server.
urlThe link to the data in Tableau.
usernameThe username you use to sign in to the Tableau server.
Important If you want to use token-based authentication, you need to replace
usernamewithtokenName. You must specify eitherusernameortokenName; if both exist, thentokenNameis used.tokenNameThe lineage harvester authentication token.
Note For token-based authentication, use this property in your lineage harvester configuration file, instead of the
usernameproperty. If both properties are present,tokenNameis used.siteIdsThe site IDs of the Tableau sites that you want to include in the ingestion process.
Warning Ensure that you specify the correct value. The correct value is the URL of the site to which you want to sign in. When you manually sign in to Tableau Server or Tableau Online, the site ID is the value that appears after /site/ in the browser address bar. In the following example URLs, the site ID isMarketingTeam:- Tableau Server: http://MyServer/#/site/MarketingTeam/projects
- Tableau Online: https://10ay.online.tableau.com/#/site/MarketingTeam/workbooks
On Tableau Server, however, the URL of the Default site does not specify the site. For example, the URL for a view named Profits, on a site named Sales, is http://localhost/#/site/sales/views/profits. The URL for this same view on the Default site is http://localhost/#/views/profits. The site name Sales does not figure in the URL. If you can't see the site ID, leave this property empty:"siteIds": [""]Example If you want to ingest two Tableau sites "Site 1" and "Site 2", you can enter the following information in the siteIds property: ["site ID of Site 1", "site ID of Site 2"].siteNamesThe site names of the corresponding site IDs.
Important This property is:- Optional for Tableau Server
- Mandatory for Tableau Online.
Warning If you have Tableau Server and you don't use this property, you must delete it from your configuration file. Don't leave the property in the configuration file without a value.restOnlyIndication whether or not you would like to use both the Tableau REST API and Tableau Metadata API to harvest Tableau metadata.
false(default): The lineage harvester will use the REST API and Metadata API to harvest Tableau metadata.true: The lineage harvester will only use the REST API to harvest Tableau metadata.
Warning If you only allow the lineage harvester to use the Tableau REST API, the harvester won't be able to process the necessary information for the technical lineage and the automatic stitching of Column assets to Tableau Data Attribute assets will not be possible.
collibraSystemNameThe name of the data source's system or server.
You must include this property in your configuration file; however, you can leave it empty, even if the
useCollibraSystemNameproperty is set totrue.If the
useCollibraSystemNameproperty is set totrue, you must prepare a Tableau <source ID> configuration file to provide the system information.The unique reference ID of the domain in Collibra Data Intelligence Cloud in which you want to ingest the Tableau assets.
Tip You can ingest Tableau assets in one or more domains in Collibra. The following table identifies which properties and which configuration files to use, depending on whether you want to ingest in one or multiple domains.If you want to... Then... Ingest in a single domain in Collibra Refer to the single domain reference ID in this
domainIDproperty.Ingest in multiple domains in Collibra Do both of the following:
- Mention a domain reference ID in this
domainIDproperty, for your Tableau Server asset. - Refer to all relevant domain reference IDs in the
domainMappingsection of the Tableau <source ID> configuration file, for your Tableau site, Tableau project and all child assets.
Important ThedomainIDproperty represents the default domain. Tableau assets that are not mapped to specific domains via thedomainMappingsection of the Tableau <source ID> configuration file, for example Tableau Server assets, are ingested in this default domain.How do I find a domain reference ID?Open the relevant domain in Collibra. The URL looks like: https://<yourcollibrainstance>/domain/22258f64-40b6-4b16-9c08-c95f8ec0da26?view=00000000-0000-0000-0000-000000040001. In this example, the reference ID is in bold.
excludeImagesOptional property for excluding the downloading of images.
To exclude the downloading of images, set this property to
true.concurrencyLevelOptional property for specifying the internal sizing, meaning the amount of tasks that can be executed at the same time.
The default value is "10", meaning as many as 10 HTTP requests can take place in parallel. Consider reducing the value if you are experiencing HTTP 401 Unauthorized errors. Setting the value to "1" effectively disables the concurrency level, so that HTTP requests will be run in a synchronous manner, instead of in parallel.
pagingOptional property for customizing the Tableau API pagination settings.
The default values are sufficient in most cases; however, you can decrease them to help mitigate node limit errors, or increase them to speed up API calls.The complete list of pagination settings, descriptions and default values"paging": { "databasesPageSize": 100, "tablesPageSize": 100, "tablesColumnsPageSize": 100, "tableColumnsPageSize": 1000, "datasourcesPageSize": 50, "datasourcesFieldsPageSize": 50, "datasourceFieldsPageSize": 100, "worksheetsPageSize": 100, "worksheetsFieldsPageSize": 100, "worksheetFieldsPageSize": 1000, "dashboardsPageSize": 100, "columnsLimit": 20, "fieldsLimit": 20 }Settings per metadata type and descriptions
Metadata type Setting and description Dashboard dashboardsPageSize: The number of dashboards per page.
Worksheet worksheetsPageSize: The number of worksheets per page.worksheetsFieldsPageSize: The number of worksheet fields per page.
Database databasesPageSize: The number of databases per page.
Table tablesPageSize: The number of tables per page.tablesColumnsPageSize: The number of table columns per page.
Table columns tableColumnsPageSize: The number of table columns per page.
Data source datasourcesPageSize: The number of data sources per page.datasourcesFieldsPageSize: The number of data source fields per page.columnsLimit: The number of data source field columns per page.fieldsLimit: The number of referenced data source fields per page.
Data source field datasourceFieldsPageSize: The number of data source fields per page.columnsLimit: The number of data source field columns per page.fieldsLimit: The number of referenced data source fields per page.
- If you keep the property set to
- Save the configuration file.
- Start the lineage harvester again in the console and run the following command:
- for Windows:
.\bin\lineage-harvester.bat full-sync - for other operating systems:
./bin/lineage-harvester full-sync
- for Windows:
- When prompted, enter the password or client secret to connect to your Collibra Data Intelligence Cloud and Tableau environment.The passwords are encrypted and stored in /config/pwd.conf.
Example
{
"general": {
"catalog": {
"url": "https://<organization>.collibra.com",
"userName": "<your-collibra-username>"
}
"useCollibraSystemName": false
},
"sources": [
{
"type": "Tableau",
"id": "unique-ID",
"url": "URL to Tableau server",
"username": "Admin",
"siteIds": ["site ID of Tableau Site 1", "site ID of Tableau Site 2"],
"siteNames": ["site name of Tableau Site 1", "site name of Tableau Site 2"],
"restOnly": false,
"collibraSystemName": "tableau-system-name",
"domainId": "Domain-resource-ID",
"excludeImages": true,
"concurrencyLevel": 1,
"paging": {
"pagination-setting": 100,
"pagination-setting-2": 100
}
}
]
}
What's next?
The lineage harvester triggers Collibra to import Tableau assets and their relations and create a technical lineage for Tableau Data Attribute assets.
If issues occur during the Tableau ingestion process, check the Tableau troubleshooting section to solve your problems.
To refresh the Tableau metadata, you can run the lineage harvester again or schedule jobs to run them automatically.
Tip You can check the progress of the Tableau ingestion in Activities. The results field indicates how many relations were imported into Data Catalog.