Transform

Transform Date

When you create a dataset, you can transform columns such as Dates and Numbers to preferred formats. It is a common need to replace N.A. with nulls or empty white space.

Note Because the Transform option defaults to a most common use case of to_date, you need to determine the proper transform function required for the data source in use.

Transform settings

Option Description
Add Collibra DQ Date Column

Adds a synthetic run date column.

Tip This can be helpful when your file does not contain dates that are easy to handle.

Flatten

Refers to the level of nesting in your file.

Tip Data Quality & Observability Classic traverses all levels, which can result in highly nested data being expanded into many rows per nested object. This can significantly impact processing performance, so use this feature with caution.

Multi-line

Refers to the content of the file being more than one line.

Note There are limitations in Explorer for remote file systems, file uploads, and temporary files. The user interface doesn't support multi-line input for these file types, but you can manually supply them using using the command line.

Zero Fill Null Replaces NULL values with 0.
Replace all field values of [value] with NULL Replaces all values of the column you specify with NULL.

Example

Copy
./owlcheck \
-ds  "dataset_transform" \
-rd  "2018-01-31" \
-f   "/Users/Documents/file.csv" \
-transform "purch_amt=cast(purch_amt as double)|return_amt=cast(return_amt as double)" 

Note Submit an expression to transform a string to a particular type.
In this example, transform the purch_amt column to a double.

Copy
-transform "purch_amt=cast(purch_amt as double)"

Note Example of converting a string to a date.

Copy
-transform "RECEIVED_DATE=to_date(CAST(RECEIVED_DATE AS STRING), 'yyyyMMdd') as RECEIVED_DATE"

Converting an integer data type to a string

When you transform your target column is an integer data type, such as 20230414, you need to convert it to a string data type before you can convert it to a date.

  1. From the Explorer page, select the Transform option.
  2. In the Expression input field, enter the following expression:
  3. date_parse(cast(<your_column_name> as varchar), '%Y%m%d')
  4. Click Save.
  5. Note If your query does not contain a ${rd} in the where clause, you need to wrap it with the DATE() function in the command line. For example, select * from default.dataset where date_parse(CAST(example_date_column as varchar), '%Y%m%d') = DATE('${rd}')

Known Limitations