Set up Insights on AWS

This section describes how to set up Insights Data Access on Amazon Web Services (AWS) with S3 bucket storage and AWS Athena query service.

Tip For information on how to set up Insights Data Access on the Google Cloud Platform, go to Set up Insights on GCP.

Prerequisites

You have the following:

  • Collibra Data Intelligence Platform 5.7 or newer.
  • License for Collibra Insights.
  • Software for working with Parquet files.

Steps

  1. Download a data snapshot from your Collibra environment.
  2. Upload the data to an S3 bucket.
  3. Download Insights Data Access from Collibra Marketplace.
  4. Set up the Insights Data Access model in AWS Athena.

Step 1: Download a data snapshot from your Collibra environment

  1. Enter the following URL in your browser:
    <your-Collibra-environment-URL>/rest/2.0/reporting/insights/directDownload?snapshotDate=<snapshot_date>&format=zip
    Tip <snapshot date> is the date from when you want the data, formatted as YYYY-MM-DD, for example, 2023-09-29. Ensure that the date you enter is within the last 31 days or is the last day of a month.
    A ZIP file of the data from your Collibra environment, for the specified date, is downloaded to your hard disk.
  2. Extract the ZIP files on your local computer.
    A folder with the name of the ZIP file is created.

Step 2: Upload the data to an S3 bucket

Note This needs to be done only once for the collection Tableau workbook files. After that, you need to perform this step only if the data layer model changes.

  1. Sign in to your AWS account.
  2. On the main menu, expand the Services page, and then select S3.
  3. On the Buckets tab, click Create bucket.

    The Create bucket dialog box appears.
  4. In the Bucket name field, enter a name for the bucket you are creating, for example, collibra-insights.
  5. Click Next.
  6. Click Next to bypass the configuration options.
  7. Clear the Block all public access checkbox to allow access to Tableau.
  8. Click Next.
  9. Click Create bucket.
    The bucket is created.
  10. On the Buckets tab, search for your newly created bucket, and then click it.

    The bucket details page opens.
  11. Click Upload to upload the data you downloaded from your Collibra environment.

    The Upload dialog box appears.
  12. Click Add files, or drag all of the folders in the ZIP file you downloaded from your Collibra environment into the dialog box.

    The folders appear in the Upload dialog box.
  13. Click Upload.
    The folders are added to the newly created bucket.

Step 3: Download the Insights Data Access package from Collibra Marketplace

  1. Go to Collibra Marketplace.
  2. Download the Insights Data Access package.
    A ZIP file is downloaded to your hard disk.
  3. Extract the ZIP file on your local computer.
    A folder with the name of the ZIP file is created.

Step 4: Create the Insights Data Access model in AWS Athena

  1. On the AWS main menu, expand the Services page, and then select Athena.
  2. On the New query tab, enter CREATE DATABASE <name-of-the-database>;.
    As shown in the following image (example), a database named collibra_rpt is already created.
  3. Click Run query.
  4. On the Database drop-down menu, select the database you created.
  5. Click + to add another query.

    In the Insights Data Access ZIP file you downloaded from the Marketplace, drag the first SQL file into a new query tab.
    The code appears in the query tab.
  6. Change the location to the recently created bucket.
    In this example, {{customer_data_location}} is replaced with collibra-insights.
  7. Click Run query.
  8. Repeat Steps 5 through 7 for each of the SQL files in the Insights Data Access ZIP file.

When all the steps are completed, all table definitions are shown and Insights Data Access is fully configured.