Using the Export and Import API

Use the Export and Import API to promote and move datasets across environments. The export/import process supports the following tables:

  • rule_repo
  • owl_rule
  • owl_check_repo
  • job_schedule
  • alert_cond
  • alert_output
  • dataset_scoring
  • dataset_schema

The export/import process consists of the following steps:

  • Add the required stored procedures (prerequisite)
  • Export content from source
  • Import content into target

The following sections describe these steps in more detail.

Add the required stored procedures (prerequisite)

Add the following stored procedure to the Collibra DQMetastore in the target and source environments. Replace <schema> with the tenant name. Note that this procedure is supported in CRDB version 23.2.16 and later.

CREATE OR REPLACE FUNCTION <schema>.dump(p_schema text, p_table text, p_where text)
 RETURNS SETOF text
 LANGUAGE plpgsql
AS $function$
 DECLARE
     dumpquery_0 text;
     dumpquery_1 text;
     selquery text;
     selvalue text;
     valrec record;
     colrec record;
 BEGIN

     -- ------ --
     -- GLOBAL --
     --   build base INSERT
     --   build SELECT array[ ... ]
     dumpquery_0 := 'INSERT INTO ' ||  quote_ident(p_schema) || '.' || quote_ident(p_table) || '(';
     selquery    := 'SELECT array[';

     <<label0>>
     FOR colrec IN SELECT table_schema, table_name, column_name, data_type
                   FROM information_schema.columns
                   WHERE table_name = p_table and table_schema = p_schema
                   ORDER BY ordinal_position
     LOOP
         dumpquery_0 := dumpquery_0 || quote_ident(colrec.column_name) || ',';
         selquery    := selquery    || 'CAST(' || quote_ident(colrec.column_name) || ' AS TEXT),';
     END LOOP label0;

     dumpquery_0 := substring(dumpquery_0 ,1,length(dumpquery_0)-1) || ')';
     dumpquery_0 := dumpquery_0 || ' VALUES (';
     selquery    := substring(selquery    ,1,length(selquery)-1)    || '] AS MYARRAY';
     selquery    := selquery    || ' FROM ' ||quote_ident(p_schema)||'.'||quote_ident(p_table);
     selquery    := selquery    || ' WHERE '||p_where;
     -- GLOBAL --
     -- ------ --

     -- ----------- --
     -- SELECT LOOP --
     --   execute SELECT built and loop on each row
     <<label1>>
     FOR valrec IN  EXECUTE  selquery
     LOOP
         dumpquery_1 := '';
         IF not found THEN
             EXIT ;
         END IF;

         -- ----------- --
         -- LOOP ARRAY (EACH FIELDS) --
         <<label2>>
         FOREACH selvalue in ARRAY valrec.MYARRAY
         LOOP
             IF selvalue IS NULL
             THEN selvalue := 'NULL';
             ELSE selvalue := quote_literal(selvalue);
             END IF;
             dumpquery_1 := dumpquery_1 || selvalue || ',';
         END LOOP label2;
         dumpquery_1 := substring(dumpquery_1 ,1,length(dumpquery_1)-1) || ');';
         -- LOOP ARRAY (EACH FIELD) --
         -- ----------- --

         -- debug: RETURN NEXT dumpquery_0 || dumpquery_1 || ' --' || selquery;
         -- debug: RETURN NEXT selquery;
         RETURN NEXT dumpquery_0 || dumpquery_1;

     END LOOP label1 ;
     -- SELECT LOOP --
     -- ----------- --

 RETURN ;
 END
 $function$
;

Add the following assignment:

alter function dump(text, text, text) owner to <ownername>;

Export content from source

Export content from the source schema. Use the following endpoint:

GET /v2/db-export

This endpoint is available in Swagger under the UI Internal definition.

Parameters

Parameter Type Description
datasets* List of strings List of datasets to export. You need to give at least one valid dataset name into the list.
schema String Name of the schema/tenant where you want to perform the export.
\
Default value: public
tables String List of tables to export on the given schema & dataset(s).
\
If you leave it empty, the following tables will be exported altogether:
rule_repo, owl_catalog, owl_rule, alert_cond, owl_check_repo, job_schedule, alert_output

Response

The endpoint returns 200:OK if the export was successful. The response includes a list of SQL INSERT statements in JSON format. Copy this response. It is required for the next step in the process.

Import content into target

Import content into the target environment. Use the following endpoint:

POST /v2/db-import

This endpoint is available in Swagger under the UI Internal definition.

Request body

In the request body, paste in the JSON response from the export endpoint (/v2/db-export). Be sure to check the JSON parameters for differences in connections, agents, spark and environment configurations. Migrating to different environments typically requires the payload to be modified. If importing into the same environment, edit the payload to avoid key constraint conflicts.

Note The import is rung on non-transactional mode, any error happens in the middle, the saved items will be left in the database as is.

Response

The endpoint returns 200:OK if the import was successful. At this point, you may open the user interface and verify that the rules were successfully transferred.