Prepare the lineage harvester configuration file for Tableau

You have to prepare a configuration file before you run the lineage harvester. The lineage harvester collects your Tableau metadata and sends it to the Collibra Data Lineage server, where it is processed and analyzed. Collibra Data Intelligence Cloud then imports the Tableau assets and relations to Data Catalog.

Prerequisites

  • You have Collibra Data Intelligence Cloud 2022.01 or newer.
    Warning If you are using Collibra Data Intelligence Cloud 2021.11 or older, you have to add all Tableau attributes in the operating model to a scope and create a scoped assignment before you ingest Tableau via the lineage harvester. For complete information and step-by-step instruction, see Tableau general troubleshooting.
  • You have the lineage harvester 2022.02 or newer.
  • You have a global role that has the Manage all resources global permission.
  • You have a global role with the Catalog global permission, for example Catalog Author.
  • You have a global role with the Technical lineage global permission.
  • You have a global role with the Data Stewardship Manager global permission.
  • You have created a BI Data Catalog domain in which you want to ingest the Tableau assets.
  • You have a resource role with the following resource permission on the community level in which you created the BI Data Catalog domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add
  • You have downloaded the lineage harvester and you have the necessary system requirements to run it.
  • You have tested your connectivity with the Tableau server.

Steps

  1. Run the following command line to start the lineage harvester:
    • Windows: .\bin\lineage-harvester.bat
    • For other operating systems: chmod +x bin/lineage-harvester and then bin/lineage-harvester
    An empty configuration file is created in the config folder.
  2. Open the lineage-harvester.conf file and enter the values for each property.
    PropertiesDescription
    general

    This section describes the connection information between the lineage harvester and Data Catalog.

    catalog

    This section contains information that is necessary to connect to Data Catalog.

    url

    The URL of your Collibra Data Intelligence Cloud environment.

    Note You can only enter the public URL of your Collibra DGC environment. Other URLs will not be accepted.

    username

    The username that you use to sign in to Collibra.

    useCollibraSystemName

    Indication whether you want to use the system or server name of a data source to match to the System asset you created when you prepared the physical data layer. This is useful when you have multiple databases with the same name.

    By default, the useCollibraSystemName property is set to false. If you want to use it, set it to true.

    • If you keep the property set to false, the lineage harvester ignores the collibraSystemName property in the rest of the configuration file.
    • If you set the useCollibraSystemName property to true, the lineage harvester reads the value in the collibraSystemName property in all sections of the configuration file and in the Tableau <source ID> configuration file.
      Note If you set the useCollibraSystemName property to "true" in your lineage harvester configuration file, but don't define the system name in the Tableau <source ID> configuration file, the system name in the Tableau technical lineage shows DEFAULT as the system name.

    Warning Unless you have multiple databases with the same name, we highly recommend that you keep the default value.

    sources

    This section contains all Tableau connection properties.

    type

    The kind of data source. In this case, the value has to be Tableau.

    id

    The unique ID to identify the Tableau metadata that was uploaded to the Collibra Data Lineage.

    Tip This value can be anything as long as it is a unique. The lineage harvester uses the ID to identify a batch of data on the Collibra Data Lineage server.

    url

    The link to the data in Tableau.

    username

    The username you use to sign in to the Tableau server.

    Important If you want to use token-based authentication, you need to replace username with tokenName. You must specify either username or tokenName; if both exist, then tokenName is used.

    tokenName

    The lineage harvester authentication token.

    Note For token-based authentication, use this property in your lineage harvester configuration file, instead of the username property. If both properties are present, tokenName is used.

    siteIds

    The site IDs of the Tableau sites that you want to include in the ingestion process.

    Warning Ensure that you specify the correct value. The correct value is the URL of the site to which you want to sign in. When you manually sign in to Tableau Server or Tableau Online, the site ID is the value that appears after /site/ in the browser address bar. In the following example URLs, the site ID is MarketingTeam:
    • Tableau Server: http://MyServer/#/site/MarketingTeam/projects
    • Tableau Online: https://10ay.online.tableau.com/#/site/MarketingTeam/workbooks

    On Tableau Server, however, the URL of the Default site does not specify the site. For example, the URL for a view named Profits, on a site named Sales, is http://localhost/#/site/sales/views/profits. The URL for this same view on the Default site is http://localhost/#/views/profits. The site name Sales does not figure in the URL. If you can't see the site ID, leave this property empty: "siteIds": [""]
    Example If you want to ingest two Tableau sites "Site 1" and "Site 2", you can enter the following information in the siteIds property: ["site ID of Site 1", "site ID of Site 2"].
    siteNames

    The site names of the corresponding site IDs.

    Important This property is:
    • Optional for Tableau Server
    • Mandatory for Tableau Online.
    Warning If you have Tableau Server and you don't use this property, you must delete it from your configuration file. Don't leave the property in the configuration file without a value.
    restOnly

    Indication whether or not you would like to use both the Tableau REST API and Tableau Metadata API to harvest Tableau metadata.

    • false (default): The lineage harvester will use the REST API and Metadata API to harvest Tableau metadata.
    • true: The lineage harvester will only use the REST API to harvest Tableau metadata.

    Warning If you only allow the lineage harvester to use the Tableau REST API, the harvester won't be able to process the necessary information for the technical lineage and the automatic stitching of Column assets to Tableau Data Attribute assets will not be possible.

    collibraSystemName

    The name of the data source's system or server.

    You must include this property in your configuration file; however, you can leave it empty, even if the useCollibraSystemName property is set to true.

    If the useCollibraSystemName property is set to true, you must prepare a Tableau <source ID> configuration file to provide the system information.

    domainId

    The unique reference ID of the domain in Collibra Data Intelligence Cloud in which you want to ingest the Tableau assets.

    Tip You can ingest Tableau assets in one or more domains in Collibra. The following table identifies which properties and which configuration files to use, depending on whether you want to ingest in one or multiple domains.

    If you want to...Then...
    Ingest in a single domain in Collibra

    Refer to the single domain reference ID in this domainID property.

    Ingest in multiple domains in Collibra

    Do both of the following:

    • Mention a domain reference ID in this domainID property, for your Tableau Server asset.
    • Refer to all relevant domain reference IDs in the domainMapping section of the Tableau <source ID> configuration file, for your Tableau site, Tableau project and all child assets.
    Important The domainID property represents the default domain. Tableau assets that are not mapped to specific domains via the domainMapping section of the Tableau <source ID> configuration file, for example Tableau Server assets, are ingested in this default domain.
    excludeImages

    Optional property for excluding the downloading of images.

    To exclude the downloading of images, set this property to true.

    concurrencyLevel

    Optional property for specifying the internal sizing, meaning the amount of tasks that can be executed at the same time.

    The default value is "10", meaning as many as 10 HTTP requests can take place in parallel. Consider reducing the value if you are experiencing HTTP 401 Unauthorized errors. Setting the value to "1" effectively disables the concurrency level, so that HTTP requests will be run in a synchronous manner, instead of in parallel.

    paging

    Optional property for customizing the Tableau API pagination settings.
    The default values are sufficient in most cases; however, you can decrease them to help mitigate node limit errors, or increase them to speed up API calls.

  3. Save the configuration file.
  4. Start the lineage harvester again in the console and run the following command:
    • for Windows: .\bin\lineage-harvester.bat full-sync
    • for other operating systems: ./bin/lineage-harvester full-sync
  5. When prompted, enter the password or client secret to connect to your Collibra Data Intelligence Cloud and Tableau environment.
    The passwords are encrypted and stored in /config/pwd.conf.

Example

{
 "general": {
   "catalog": {
     "url": "https://<organization>.collibra.com",
     "userName": "<your-collibra-username>"
   }
   "useCollibraSystemName": false
 },
 "sources": [
  {
   "type": "Tableau",
   "id": "unique-ID",
   "url": "URL to Tableau server",
   "username": "Admin",
   "siteIds": ["site ID of Tableau Site 1", "site ID of Tableau Site 2"],
   "siteNames": ["site name of Tableau Site 1", "site name of Tableau Site 2"],
   "restOnly": false,
   "collibraSystemName": "tableau-system-name",
   "domainId": "Domain-resource-ID",
   "excludeImages": true,
   "concurrencyLevel": 1,
   "paging": {
	  "pagination-setting": 100,
	  "pagination-setting-2": 100
	}
  }
 ]
}

What's next?

The lineage harvester triggers Collibra to import Tableau assets and their relations and create a technical lineage for Tableau Data Attribute assets.

If issues occur during the Tableau ingestion process, check the Tableau troubleshooting section to solve your problems.

To refresh the Tableau metadata, you can run the lineage harvester again or schedule jobs to run them automatically.

Tip You can check the progress of the Tableau ingestion in Activities. The results field indicates how many relations were imported into Data Catalog.