Transformation logic and common errors for DataStage

Use this topic to understand how Collibra Data Lineage extracts lineage and resolves parameters from DataStage, based on whether jobs are specified in the source configuration. This topic also lists the common errors you might encounter when creating technical lineage for DataStage and provides possible solutions to each error.

Transformation logic and parameter resolution

In DataStage, you can have multiple jobs, and jobs can call other jobs. Collibra Data Lineage extracts lineage based on whether jobs are specified in the source configuration. Additionally, parameter resolution depends on whether parameter names and values are specified for the jobs in the source configuration.

If you use technical lineage via Edge, specify jobs, parameter names, and values in the Source Configuration field in the technical lineage for DataStage capability. If you use the lineage harvester, specify them in the source ID file.

Consider the following sequence of jobs as an example. In this example, only Job1 and Job4 are ran as parent jobs in DataStage.

For jobs not configured:

If you do not specify jobs in the source configuration, Collibra Data Lineage considers all jobs are ran as parent jobs in DataStage, including the jobs that are called. As a result, Collibra Data Lineage extracts lineage with the following sequences. When combined with the parameter resolution process, this approach might lead to incorrect or duplicate technical lineage.

For jobs configured:

If you specify jobs in the source configuration, Collibra Data Lineage extracts lineage only from the specified jobs and the jobs that are called by the specified jobs. Take the following source configuration as an example:

"jobs": [
  "job1",
  "job4"
]

In this case, Collibra Data Lineage extracts lineage for Job1, Job4, and the jobs that Job1 and Job4 called:

Parameter resolution

Input parameters in jobs can directly impact the extracted lineage. For example, a parameter that influences an SQL statement can affect how lineage is determined.

Collibra Data Lineage resolves parameters in a specific order, depending on whether the job is a parent job or a job called by another job.

In the following example, job5 is called by parent job Job4.

For jobs that are called by another job: 

Collibra Data Lineage resolves the value for j5_param1 in Job5 in the following order:

  1. Use the value from the parent job, if available. If a DataStage expression is detected, DATASTAGE EXPRESSION is used as the value.
  2. Use the value specified in the perJobParameters property for this specific job in the source configuration, if available.
  3. Use the value specified in the jobParameters property in the source configuration, if available.
  4. Use the default value.

For parent jobs: 

Collibra Data Lineage resolves the values for j4_param1 and j4_param2 in parent job job4 in the following order:

  1. Use the value specified in the perJobParameters property for this specific job in the source configuration, if available.
  2. Use the value specified in the jobParameters property in the source configuration, if available.
  3. Use the default value.

In the following example, j4_param1 resolves to value for j4_param1, and j4_param2 resolves to specific job value for j4_param2, because the perJobParameters property takes precedence over the jobParameters property.

"jobParameters": [
    {
       "name": "j4_param1",
       "value": "value for j4_param1"
    },
    {
       "name": "j4_param2",
       "value": "general value for j4_param2"
    }
],
"perJobParameters": {
  "Job4": [
    {
       "name": "j4_param2",
       "value": "specific job value for j4_param2"
    }
  ]
}

Recursive lookup

Collibra Data Lineage supports recursive lookup, so you can specify a parameter reference in the perJobParameters or jobParameters property.

For example, if you specify the jobParameters property as follows, Collibra Data Lineage looks up the value for the #another_param parameter recursively until an actual value is resolved. In this case, the actual value is final value.

"jobParameters": [
    {
      "name": "j5_param1"
      "value": "#another_param"
    }
    {
      "name": "another_param",
      "value": "final value"
    }
]

Parameter set resolution

If a parameter set is used in DataStage, ensure that you specify the value property in perJobParameters or jobParameters property with the value file name.

The following graphic shows an example of the value file names.

The following example shows that the valueFileName1 value file name is specified for the parameterSet1 parameter set.

"jobParameters": [
  {
    "name": "parameterSet1",
    "value": "valueFileName1"
  }
]

When you specify the value property with a value file name, the value is resolved in the following order:

  1. Use the value from the parent job, if available.
  2. Use the value specified in the perJobParameters property for this specific job in the source configuration, if available.
  3. Use the value specified in the jobParameters property in the source configuration, if available.
  4. When the Default Value is set to (As pre-defined), use the pre-defined value from the parent job.
  5. Use the default value.

User variable resolution

User variables can contain DataStage expressions. When Collibra Data Lineage detects a DataStage expression, the value is resolved in the following order:

  1. Use the value specified in the perJobParameters property for this specific job in the source configuration, if available.
  2. Use the value specified in the jobParameters property in the source configuration, if available.
  3. Use DATASTAGE EXPRESSION as the value.

The following code example shows how to use the user variable.

"jobParameters": [
  {
     "name": "UserVariable_Activity_5.sql_query",
     "value": "SELECT COL1, COL2 FROM TABLE1;"
  }
]

You can specify parameter set references with user variables.

"jobParameters": [
  {
    "name": "UserVariable_Activity_5.sql_query",
    "value": "#Parameter_Set_SQL_Statements.SrcQuery#"
  }
]

Analyze errors and possible solutions

After you create technical lineage for DataStage, you might encounter the following errors in the analysis results in the technical lineage Sources tab page.