Archiving break records from Pullup jobs

This section shows you how to set up the archive break records feature for Pullup jobs.

Prerequisites

The following table shows the available external storage options and the requirements for each.

Storage option Prerequisites
Amazon S3
  • An Amazon S3 connection.
  • Read and write access on your Amazon S3 bucket.
  • Minimum required bucket permissions...
    Copy
    {
        "Version": "YYYY-MM-DD",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:ListStorageLensConfigurations",
                    "s3:ListAccessPointsForObjectLambda",
                    "s3:GetAccessPoint",
                    "s3:PutAccountPublicAccessBlock",
                    "s3:GetAccountPublicAccessBlock",
                    "s3:ListAllMyBuckets",
                    "s3:ListAccessPoints",
                    "s3:PutAccessPointPublicAccessBlock",
                    "s3:ListJobs",
                    "s3:PutStorageLensConfiguration",
                    "s3:ListMultiRegionAccessPoints",
                    "s3:CreateJob"
                ],
                "Resource": "*"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": "s3:*",
                "Resource": [
                    "arn:aws:s3:::YOURS3BUCKETNAME",
                    "arn:aws:s3:::YOURS3BUCKETNAME/*"
                ]
            }
        ]
    }
ADLS
  • An ADLS connection.
  • Read and write access on your ADLS bucket.
Azure Blob
  • An Azure Blob connection.
  • Read and write access on your Azure Blob bucket.
Google Cloud Storage (GCS)
  • A GCS connection.
  • Editor and Viewer access on your Cloud Storage bucket.

Steps

  1. From Explorer, connect to a Pullup data source.
  2. In the Select Columns step, assign a Link ID to a column.
  3. In the lower left corner, click Cogwheel icon Settings.
  4. The Settings dialog box appears.
  5. Under the Data Quality Job section, select the Archive Breaking Records checkbox option, then click the drop-down list.
  6. A list of available external storage options appears.
  7. Select the external storage option to which break records will send.
  8. Click Save.
  9. Set up and run your DQ Job.
  10. When a record breaks, its metadata exports automatically to your external storage service.

Important Ensure that a column is assigned as the Link ID for external break record archival to work properly.