Set up Insights on GCP

This section describes how to set up Insights Data Access on the Google Cloud Platform (GCP) with Google Cloud Storage and Google BigQuery.

Tip For information on how to set up Insights Data Access on Amazon Web Services, go to Set up Insights on AWS.

Prerequisites

You have the following:

  • Collibra Data Intelligence Platform 5.7 or newer.
  • License for Collibra Insights.
  • Software for working with Parquet files.

Steps

  1. Download a data snapshot from your Collibra environment.
  2. Upload the data to a Google Cloud Storage bucket.
  3. Create the Insights Data Access model in Google BigQuery.

Step 1: Download a data snapshot from your Collibra environment

  1. Enter the following URL in your browser:
    <your-Collibra-environment-URL>/rest/2.0/reporting/insights/directDownload?snapshotDate=<snapshot_date>&format=zip
    Tip <snapshot date> is the date from when you want the data, formatted as YYYY-MM-DD, for example, 2023-09-29. Ensure that the date you enter is within the last 31 days or is the last day of a month.
    A ZIP file of the data from your Collibra environment, for the specified date, is downloaded to your hard disk.
  2. Extract the ZIP files on your local computer.
    A folder with the name of the ZIP file is created.

Step 2: Upload the data to a Google Cloud Storage bucket

Note This needs to be done only once for the collection Tableau workbook files. After that, you need to perform this step only if the data layer model changes.

  1. Sign in to your GCP account and choose your working project for Insights deployment.
    Tip We recommend that you create a separate project for Insights deployment.
  2. On the tab menu, click the Storage tab, and then click Cloud Storage.
  3. On the Browser tab, click Create bucket.

    The Create a bucket dialog box appears.
  4. In the Name your bucket field, enter a name for the bucket you are creating, for example, collibra-insights.
  5. Click Continue.
  6. In the Choose where to store your data section, enter the relevant values, for example:
    • Location type: Multi-region
    • Location: Your geographic location
    Tip Contact your IT department for help with the correct values for your Collibra environment configuration and to ensure compliance with your company policies.
  7. Click Continue.
  8. In the Choose a default storage class for your data section, click Standard.
  9. Click Continue.
  10. in the Choose how to control access to objects section, enter the relevant values, for example:
    • Access control: Uniform
    Tip Contact your IT department for help with the correct values for your Collibra environment configuration and to ensure compliance with your company policies.
  11. Click Continue.
  12. In the Choose how to protect object data section, enter the relevant values, for example:
    • Protection tool: None
    Tip Contact your IT department for help with the correct values for your Collibra environment configuration and to ensure compliance with your company policies.
  13. Click Create.
    The bucket is created.
  14. On the Browse tab, search for your newly created bucket, and then click it.

    The bucket details page opens.
  15. Click Upload Folder to upload the data you downloaded from your Collibra environment.

    The Upload dialog box appears.
  16. In the Upload dialog box, find the unpacked folders of the ZIP file you downloaded from your Collibra environment. As shown in the following image, there are eight folders to be uploaded.

  17. Select a folder, for example, complex_relation, and then click Upload.

    Note You can select only one folder at a time.

  18. Repeat Steps 15 through 17 until you have uploaded all eight folders.
    The folders are added to the newly created bucket.

Step 3: Create the Insights Data Access model in Google BigQuery

Tip The objective of Steps 6 through 8 in the following procedure can also be achieved by using a Cloud shell command.

  1. On the left tab menu, in the BIG DATA section, click BigQuery.
  2. On the Explorer page, find your Insights project, and then click > Create dataset.
  3. In the Create dataset side panel, enter the relevant information.

    FieldDescription
    Dataset IDA unique name for your dataset.
    Data location

    The geographical region of your data.

    Tip Contact your IT department for help with the correct value for your Collibra environment configuration and to ensure compliance with your company policies.
  4. Click Create dataset.
  5. In the Explorer page, find your newly created dataset, and then click > Open.
    The dataset view page opens.
  6. In the dataset view page, click Create table.

    The Create table side panel opens.
  7. In the Create table section, enter the relevant information.

    FieldDescription
    Create table fromSelect Google Cloud Storage.
    Select file from GCS bucket

    Enter <your-data-bucket-name>/<data type>/*.parquet

    The bucket name is the one you created in Step 2.4 and the data type, for example, asset, is the sub-directory location.

    Tip Step 9 of this procedure prompts you to repeat Steps 6 through 8, for each data type, for example, asset, attributes, relation, responsibility, and so on.

    File formatSelect Parquet.
    Source Data PartitioningThis checkbox must be cleared.
    Search for a project / Enter a project nameSelect the Search for a project option.
    Project nameSelect the project you are using for Insights deployment.
    Dataset nameSelect the database name you entered in Step 3.3.
    Table typeSelect Native table.
    Table name

    Enter the data type. This must match the data type entered for the sub-directory location in the Select file from GCS bucket field.

    Tip Step 9 of this procedure prompts you to repeat Steps 6 through 8, for each data type, for example, asset, attribute, relation, responsibility, and so on.

  8. Click Create table.
  9. Repeat Steps 6 through 8 for each data type in the file you downloaded in Step 1.1, for example, asset, relation, responsibility, and so on.

When all the steps are completed, all table definitions are shown and Insights Data Access is fully configured.

Use a Cloud shell command

The objective of Steps 6 through 8 in the previous procedure can also be achieved by using a Cloud shell command.

Run the following command, where <customer-dataset-name> and <customer-data-bucket> are replaced with the relevant values.

bq load \
	--noreplace \
	--source_format=PARQUET \
	<customer-dataset-name>.asset \
	gs://<customer-data-bucket>/asset/*.parquet

bq load \
	--noreplace \
	--source_format=PARQUET \
	<customer-dataset-name>.asset_tag \
	gs://<customer-data-bucket>/asset_tag/*.parquet

bq load \
	--noreplace \
	--source_format=PARQUET \
	customer-dataset-name>.attribute \
	gs://<customer-data-bucket>/attribute/*.parquet

bq load \
	--noreplace \
	--source_format=PARQUET \
	<customer-dataset-name>.community \
	gs://<customer-data-bucket>/community/*.parquet

bq load \
	--noreplace \
	--source_format=PARQUET \
	<customer-dataset-name>.complex_relation \
	gs://<customer-data-bucket>/complex_relation/*.parquet

bq load \
	--noreplace \
	--source_format=PARQUET \
	<customer-dataset-name>.domain \
	gs://<customer-data-bucket>/domain/*.parquet

bq load \
	--noreplace \
	--source_format=PARQUET \
	<customer-dataset-name>.relation \
	gs://<customer-data-bucket>/relation/*.parquet

bq load \
	--noreplace \
	--source_format=PARQUET \
	<customer-dataset-name>.responsibility \
	gs://<customer-data-bucket>/responsibility/*.parquet