Transform

Transform Date

When you create a dataset, you can transform columns such as Dates and Numbers to preferred formats. It is a common need to replace N.A. with nulls or empty white space.

Note Because the Transform option defaults to a most common use case of to_date, you need to determine the proper transform function required for the data source in use.

Transform settings

Option Description
Add Collibra DQ Date Column

Adds a synthetic run date column.

Tip This can be helpful when your file does not contain dates that are easy to handle.

Flatten Refers to the level of nesting in your file.
Multi-line Refers to the content of the file being more than one line.
Zero Fill Null Replaces NULL values with 0.
Replace all field values of [value] with NULL Replaces all values of the column you specify with NULL.

Example

Copy
./owlcheck \
-ds  "dataset_transform" \
-rd  "2018-01-31" \
-f   "/Users/Documents/file.csv" \
-transform "purch_amt=cast(purch_amt as double)|return_amt=cast(return_amt as double)" 

Note Submit an expression to transform a string to a particular type.
In this example, transform the purch_amt column to a double.

Copy
-transform "purch_amt=cast(purch_amt as double)"

Note Example of converting a string to a date.

Copy
-transform "RECEIVED_DATE=to_date(CAST(RECEIVED_DATE AS STRING), 'yyyyMMdd') as RECEIVED_DATE"

Converting an integer data type to a string

When you transform your target column is an integer data type, such as 20230414, you need to convert it to a string data type before you can convert it to a date.

  1. From the Explorer page, select the Transform option.
  2. In the Expression input field, enter the following expression:
  3. date_parse(cast(<your_column_name> as varchar), '%Y%m%d')
  4. Click Save.
  5. Note If your query does not contain a ${rd} in the where clause, you need to wrap it with the DATE() function in the command line. For example, select * from default.dataset where date_parse(CAST(example_date_column as varchar), '%Y%m%d') = DATE('${rd}')

Known Limitations

  • The UI does not run multi-line. A workaround is to supply these options manually in the command line.
  • Explorer can flatten nested JSON array, but DQ Core can only flatten nested JSON with JSON object. Because of this limitation, you may encounter a scenario where the results in Explorer are flattened, but not in DQ Core.