Archiving break records from Pullup jobs

This section shows you how to set up the archive break records feature for Pullup jobs.

Prerequisites

The following table shows the available external storage options and the requirements for each.

Storage option Prerequisites
Amazon S3
  • An Amazon S3 connection.
  • Read and write access on your Amazon S3 bucket.
  • Minimum required bucket permissions...
    Copy
    {
        "Version": "YYYY-MM-DD",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:ListStorageLensConfigurations",
                    "s3:ListAccessPointsForObjectLambda",
                    "s3:GetAccessPoint",
                    "s3:PutAccountPublicAccessBlock",
                    "s3:GetAccountPublicAccessBlock",
                    "s3:ListAllMyBuckets",
                    "s3:ListAccessPoints",
                    "s3:PutAccessPointPublicAccessBlock",
                    "s3:ListJobs",
                    "s3:PutStorageLensConfiguration",
                    "s3:ListMultiRegionAccessPoints",
                    "s3:CreateJob"
                ],
                "Resource": "*"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": "s3:*",
                "Resource": [
                    "arn:aws:s3:::YOURS3BUCKETNAME",
                    "arn:aws:s3:::YOURS3BUCKETNAME/*"
                ]
            }
        ]
    }
ADLS
  • An ADLS connection.
  • Read and write access on your ADLS bucket.
Azure Blob
  • An Azure Blob connection.
  • Read and write access on your Azure Blob bucket.
Google Cloud Storage (GCS)
  • A GCS connection.
  • Editor and Viewer access on your Cloud Storage bucket.

Steps

  1. From Explorer, connect to a Pullup data source.
  2. Optionally, assign a Link ID to a column in the Select Columns step.
  3. Important If you specify a link ID, the column you assign as the link ID should not contain NULL values and its values should be unique, most commonly the primary key. Composite primary key is also supported.

  4. In the lower left corner, click Cogwheel icon Settings.
  5. The Settings dialog box appears.
  6. Under the Data Quality Job section, select the Archive Breaking Records checkbox option, then click the drop-down list.
  7. A list of available external storage options appears.
  8. Select the external storage option to which break records will send.
  9. Click Save.
  10. Set up and run your DQ Job.
  11. When a record breaks, its metadata exports automatically to your external storage service.