Configure the Jobserver to Collibra Data Intelligence Platform communication

By default, Collibra Data Intelligence Platform sends jobs to a Jobserver but you also have the possibility to have the Jobserver poll Collibra Data Intelligence Platform for jobs.

In this section, we describe how to configure the Jobserver to poll Collibra for jobs. You will have to configure both the Data Governance Center service and the Jobserver service.

Prerequisites

Steps

Configure the Jobserver service

Warning 

Only apply changes to these settings if you are really experienced with JVM parameters. Changing parameters may cause serious performance issues.

To remove an individual JVM property you must use the delete icon () alongside the property, otherwise, the service will interpret it as a blank line and fail to start correctly.

Restart the service after editing the JVM parameters.

Execute the following steps in the Collibra Console instance that manages your Jobserver:

  1. Open Collibra Console with a user profile that has the SUPER role.
    Collibra Console opens with the Infrastructure page.

    Tip The default address to access Collibra Console is <server hostname>:4402, but you may have set another port during the installation of Collibra Console. Keep in mind that a firewall of your operating system can block the access to Collibra Console.

  2. Click the Jobserver service of a Collibra environment.
  3. Click Infrastructure Configuration.
  4. Click JVM configuration.
  5. Click Edit configuration.
  6. Add the following JVM settings:

    Setting

    Description

    reversehttp.gateway

    The setting to enable the Jobserver's gateway. To enable the communication from the on-premises Jobserver to the Collibra Data Intelligence Platform environment., this value must be true.

    Example -Dreversehttp.gateway=true

    proxy.url

    The URL of the Collibra environment, followed by reversehttp-poll/<gateway-id>.

    This "gateway-id" must be identical to the one used in the Name parameter when you add the Jobserver to the DGC service.

    The value of this setting is case-sensitive.

    Example -Dproxy.url=https://<your-environment-url>/reversehttp-poll/Jobserver-1
    target.url

    The URL of your the target system, either an on-premises Jobserver or a Tableau server.

    Example 
    • Jobserver: -Dtarget.url=http://localhost:4404
    • Tableau server: -Dtarget.url=https://tableau-sales.yourcompany.com

    http.proxy.host

    (optional)

    The hostname of the HTTP proxy server for outbound connections to your Collibra Data Intelligence Platform environment.

    This option is used to enable outbound traffic monitoring.

    Example -Dhttp.proxy.host=proxy.yourcompany.com

    http.proxy.port

    (optional)

    The port of the HTTP proxy server for outbound connections to your Collibra Data Intelligence Platform environment.

    This option is used to enable outbound traffic monitoring.

    Example -Dhttp.proxy.port=8080

    username

    The username of any Collibra user for basic authentication.

    Example -Dusername=john.fisher

    password

    The corresponding password of the Collibra user for basic authentication.

    Example -Dpassword=ChangeMe

    You can encrypt this password if necessary.

    Example -Dpassword=enc_2:t2rklBY6699aWV0...

    keystore.path

    The full path to the PKCS12 keystore. This keystore should contain the private key to sign the basic authentication header.

    Example -Dkeystore.path=/opt/collibra_data/spark-jobserver/security/jobserver-1-keystore.p12

    keystore.alias

    The alias of the private key in the keystore. Each alias must be unique in your configuration.

    If you used the name argument during the creation of the keystore, then use the value of this name argument.

    If only 1 keystore is created, the default alias is "1".

    Example 
    -Dkeystore.alias=1
    -Dkeystore.alias=MyJobserver

    keystore.password

    (optional)

    The password to access the keystore. If the keystore is not password-protected, don't add it to the JVM settings.

    You can encrypt this password if necessary.

    Example -Dkeystore.password=ChangeMe

    keystore.key.password

    (optional)

    The password to use the private key, only applicable if you secured the private key with a password. If the key is not password-protected, don't add it to the JVM settings.

    You can encrypt this password if necessary.

    Example -Dkeystore.key.password=ChangeMe

    polling.backoff

    The time in milliseconds between a polling failure and a next polling attempt.

    We recommend to not define this parameter, it then uses the default value of 5,000 milliseconds.

    Example -Dpolling.backoff=10000

    max.connections.route

    The maximum number of HTTP connections per route.

    We recommend to not define this parameter, it then uses the default value of 20.

    Example -Dmax.connections.route=30

    max.connections.total

    The maximum number of all HTTP connections.

    We recommend to not define this parameter, it then uses the default value of 40.

    Example -Dmax.connections.total=60

    idle.connection.timeout

    The time in milliseconds that an idle connection is kept in the connection pool.

    We recommend to not define this parameter, it then uses the default value of 5,000 milliseconds.

    Example -Didle.connection.timeout=3000

    connection.timeout

    The time in milliseconds that the reverse HTTP server waits for a response from your Collibra environment or from the value in target.url.

    If you don't set this parameter, the value is 60,000 milliseconds.

    Example -Dconnection.timeout=30000

    connection.soTimeout

    The time in milliseconds that the reverse HTTP server waits for a response from your Collibra environment or from the value in target.url on socket level.

    If you don't set this parameter, the value is 60,000 milliseconds.

    Example -Dconnection.soTimeout=30000

    polling.timeout

    The time in milliseconds that the reverse HTTP server waits for a poll request from your Collibra environment to be submitted to the target.url.

    If you don't set this parameter, the value is 300,000 milliseconds.

    Example -Dpolling.timeout=100000

    polling.period

    The time in milliseconds that the reverse HTTP server waits in between poll request sessions. In other words, after having received a poll request or no request from your Collibra environment, the reverse HTTP server waits a certain amount of milliseconds before contacting the Collibra environment again.

    If you don't set this parameter, the value is 100 milliseconds.

    Example -Dpolling.period=200

    health.check.enabled(optional)

    Enables the health check mechanism between the Collibra environment and the reverse HTTP server.

    If you don't set this parameter, the value is false.

    Example -Dhealth.check.enabled=true

    health.check.period(optional)

    The time in milliseconds that the reverse HTTP server waits between health checks of its connection with Collibra.

    If you don't set this parameter, the value is 5,000 milliseconds.

    Example -Dhealth.check.period=10000

    health.check.timeout (optional)

    The time in milliseconds that the reverse HTTP server waits for a health check response from Collibra.

    If you don't set this parameter, the value is 5,000 milliseconds.

    Example -Dhealth.check.timeout=10000

    Note You have to use separate Jobservers for the ingestion of S3 or JDBC data sources and Tableau server data.

  7. Click the green Save all button.
  8. Click Security configuration.
  9. Click Edit configuration.
  10. Set the Authentication level to NONE.

    Note This means that there is a one-way outbound communication over TLS from the Jobserver to the Collibra environment, note that there is no authentication at all.

  11. Click the green Save all button.

Add the Jobserver to the DGC service

Execute the following steps in Collibra Console of your Collibra Data Intelligence Platform environment.

  1. Open the DGC service settings for editing:
  2. Go to the Jobserver section of the configuration.
  3. Enter the required information.
    SettingDescription
    Name

    The name of the Jobserver as it will appear when you register a data source. The name is a freely chosen name but it is recommended to only use alphanumerical characters and dashes, for example Jobserver-1.

    You will have to use this name as the ID of the gateway and in the address of this configuration.

    Protocol

    The protocol for this configuration has to be HTTP and not the recommended HTTPS, this is because of the Collibra internal architecture.

    Address

    The loopback address of the DGC service, followed by /reversehttp/<gateway-id>.

    The "gateway-id" must be identical to the one used in the Name parameter of this configuration.

    Do not use the scheme in the address.

    Example localhost:4400/reversehttp/Jobserver-1

    Trusted server CA certificate

    The certificate in PEM format that contains the public key of the Jobserver to validate the signature of the basic authentication header.

    In the example to create a keystore, this is the content of the file cert.pem.

    Example 
    -----BEGIN CERTIFICATE----- MIICqDCCAZACCQCcy3Oq51c5YzANBgkqhkiG9w0BAQsF
    ADAWMRQwEgYDVQQDDAtq b2JzZXJ2ZXIt...
    -----END CERTIFICATE-----

    Client certificate

    This field is not used in this configuration.

    Client private key

    This field is not used in this configuration.

    Note This field always shows dots, even if it is empty.

    Table profiling data size

    The approximate maximum disk size of the data in MB that will be used to profile a table. The value cannot exceed 10 000.

    If you use a truststore, go to Generate keys, certificates and keystores.

  4. Click the green Save all button.

If all settings and communication paths are correctly configured, you will see a notice on the Jobserver:

INFO [I/O dispatcher 1] reversehttp.gateway.PollingController - proxy -> no requests polled (204)

What's next?

When you have set up this communication, you may want to monitor the outbound traffic. You can do so by enabling a man-in-the-middle proxy.