Transform
Transform Date
Note Because the Transform option defaults to a most common use case of to_date, you need to determine the proper transform function required for the data source in use.
| Option | Description |
|---|---|
| Add Collibra DQ Date Column |
Adds a synthetic run date column. Tip This can be helpful when your file does not contain dates that are easy to handle. |
| Flatten |
Refers to the level of nesting in your file. Tip Data Quality & Observability Classic traverses all levels, which can result in highly nested data being expanded into many rows per nested object. This can significantly impact processing performance, so use this feature with caution. |
| Multi-line |
Refers to the content of the file being more than one line. Note There are limitations in Explorer for remote file systems, file uploads, and temporary files. The user interface doesn't support multi-line input for these file types, but you can manually supply them using using the command line. |
| Zero Fill Null | Replaces NULL values with 0. |
| Replace all field values of [value] with NULL | Replaces all values of the column you specify with NULL. |
Example
./owlcheck \
-ds "dataset_transform" \
-rd "2018-01-31" \
-f "/Users/Documents/file.csv" \
-transform "purch_amt=cast(purch_amt as double)|return_amt=cast(return_amt as double)"
Note Submit an expression to transform a string to a particular type.
In this example, transform the purch_amt column to a double.
-transform "purch_amt=cast(purch_amt as double)"
Note Example of converting a string to a date.
-transform "RECEIVED_DATE=to_date(CAST(RECEIVED_DATE AS STRING), 'yyyyMMdd') as RECEIVED_DATE"
Converting an integer data type to a string
When you transform your target column is an integer data type, such as 20230414, you need to convert it to a string data type before you can convert it to a date.
- From the Explorer page, select the Transform option.
- In the Expression input field, enter the following expression:
- Click Save.
date_parse(cast(<your_column_name> as varchar), '%Y%m%d')
Note If your query does not contain a ${rd} in the where clause, you need to wrap it with the DATE() function in the command line. For example, select * from default.dataset where date_parse(CAST(example_date_column as varchar), '%Y%m%d') = DATE('${rd}')
Known Limitations
- The UI does not run multi-line. A workaround is to supply these options manually in the command line.
- Explorer can flatten nested JSON array, but DQ Core can only flatten nested JSON with JSON object. Because of this limitation, you may encounter a scenario where the results in Explorer are flattened, but not in DQ Core.