Export and Import API
Promoting and moving datasets across environments.
Pre-requirements
Warning The database needs the stored procedure (function) defined in order to use the Export/Import API.
V2 - Stable - available from 2022.02 release
Step 1a - Export content from source schema
Exports tables from database
Parameters
Query
datasets * |
List of strings | List of datasets to export. You need to give at least one valid dataset name into the list. |
schema
|
String | Name of the schema/tenant where you want to perform the export. \ Default value: public |
tables
|
String | List of tables to export on the given schema & dataset(s). \ If you leave it empty, the following tables will be exported altogether: rule_repo, owl_catalog, owl_rule, alert_cond, owl_check_repo, job_schedule, alert_output |
Responses | ||
200: OK | List of SQL - INSERT statements as JSON list. |
{
// Response
}
400: Bad Request | Any error happened with error message |
{
// Response
}
Step 1b - Import content #
https://<collibra-dq-url>/v2/db-import
Import content into the target tenant
The target schema/tenant name will be part of the input SQL INSERT statements.
The import is rung on non-transactional mode, any error happens in the middle, the saved items will be left in the database as is.
Parameters
Body
* | List of SQL INSERT, which will be imported into the target Collibra DQ Metastore. Format: JSON string list |
Responses | |
200: OK | When the import was successful. |
{
// Response
}
400: Bad Request | Any error happened with error message |
{
// Response
}
We suggest using db-export, but we will not remove get-exports. We do expect to consolidate the newer logic behind the method.
Step 1c - Get-Exports
You can pass in several dataset names and several tables at once. This endpoint will create a JSON payload
Note Exports and Imports are currently limited to the 3 tables listed below.
These are the three most common tables. These are the supported tables for re-promotion (running the export multiple times). The most common use case is to copy jobs and rules from environment A to environment B. Running the export/import sequence on the same environment likely result in a key constraint conflict, unless in-between edits are made to the insert payload.
- owl_rule
- job_schedule
- owl_check_repo
http://<url>/v2/get-exports?dataset=public.dataset_scan_2,public.dataset_scan_1&schema=public&tables=owl_rule,job_schedule,owl_check_repo
Use Swagger to build this for you
This is located under controller-scala (internal API)
Click Try it out to input the details
Step 2 - Run-Import
Note You will want to perform a find/replace on the import payload to check for differences in connections, agents, spark and environment configurations. Migrating to different environments typically requires the payload to be modified.
Run import on the desired environment, passing the output of the previous statement to the body of the request.
http://<url>/v2/run-import
Use Swagger to try it out
This is under controller-catalog.
This would be the body of the POST.
Requirement - Stored Procedure
The following function needs to be created in the Collibra DQ Metastore, before this can run.
CREATE OR REPLACE FUNCTION public.dump(p_schema text, p_table text, p_where text)
RETURNS SETOF text
LANGUAGE plpgsql
AS $function$
DECLARE
dumpquery_0 text;
dumpquery_1 text;
selquery text;
selvalue text;
valrec record;
colrec record;
BEGIN
-- ------ --
-- GLOBAL --
-- build base INSERT
-- build SELECT array[ ... ]
dumpquery_0 := 'INSERT INTO ' || quote_ident(p_schema) || '.' || quote_ident(p_table) || '(';
selquery := 'SELECT array[';
<<label0>>
FOR colrec IN SELECT table_schema, table_name, column_name, data_type
FROM information_schema.columns
WHERE table_name = p_table and table_schema = p_schema
ORDER BY ordinal_position
LOOP
dumpquery_0 := dumpquery_0 || quote_ident(colrec.column_name) || ',';
selquery := selquery || 'CAST(' || quote_ident(colrec.column_name) || ' AS TEXT),';
END LOOP label0;
dumpquery_0 := substring(dumpquery_0 ,1,length(dumpquery_0)-1) || ')';
dumpquery_0 := dumpquery_0 || ' VALUES (';
selquery := substring(selquery ,1,length(selquery)-1) || '] AS MYARRAY';
selquery := selquery || ' FROM ' ||quote_ident(p_schema)||'.'||quote_ident(p_table);
selquery := selquery || ' WHERE '||p_where;
-- GLOBAL --
-- ------ --
-- ----------- --
-- SELECT LOOP --
-- execute SELECT built and loop on each row
<<label1>>
FOR valrec IN EXECUTE selquery
LOOP
dumpquery_1 := '';
IF not found THEN
EXIT ;
END IF;
-- ----------- --
-- LOOP ARRAY (EACH FIELDS) --
<<label2>>
FOREACH selvalue in ARRAY valrec.MYARRAY
LOOP
IF selvalue IS NULL
THEN selvalue := 'NULL';
ELSE selvalue := quote_literal(selvalue);
END IF;
dumpquery_1 := dumpquery_1 || selvalue || ',';
END LOOP label2;
dumpquery_1 := substring(dumpquery_1 ,1,length(dumpquery_1)-1) || ');';
-- LOOP ARRAY (EACH FIELD) --
-- ----------- --
-- debug: RETURN NEXT dumpquery_0 || dumpquery_1 || ' --' || selquery;
-- debug: RETURN NEXT selquery;
RETURN NEXT dumpquery_0 || dumpquery_1;
END LOOP label1 ;
-- SELECT LOOP --
-- ----------- --
RETURN ;
END
$function$
;
This assignment needs to be added.
alter function dump(text, text, text) owner to <ownername>;