Configure the Jobserver to Collibra Platform communication
By default, Collibra Platform sends jobs to a Jobserver but you also have the possibility to have the Jobserver poll Collibra Platform for jobs.
In this section, we describe how to configure the Jobserver to poll Collibra for jobs. You will have to configure both the Collibra Platform service and the Jobserver service.
Prerequisites
- You have Collibra Platform2020.10 or newer.
- You have created a keystore in the PKCS#12 format on the node that hosts the Jobserver service.
Steps
Configure the Jobserver service
Only apply changes to these settings if you are really experienced with JVM parameters. Changing parameters may cause serious performance issues.
To remove an individual JVM property you must use the delete icon () alongside the property, otherwise, the service will interpret it as a blank line and fail to start correctly.
Restart the service after editing the JVM parameters.
Execute the following steps in the Collibra Console instance that manages your Jobserver:
- Open Collibra Console with a user profile that has the SUPER role.
Collibra Console opens with the Infrastructure page.Tip The default address to access Collibra Console is <server hostname>:4402, but you may have set another port during the installation of Collibra Console. Keep in mind that a firewall of your operating system can block the access to Collibra Console.
- Click the Jobserver service of a Collibra environment.
- Click Infrastructure Configuration.
- Click JVM configuration.
- Click Edit configuration.
- Add the following JVM settings:
Setting
Description
reversehttp.gateway The setting to enable the Jobserver's gateway. To enable the communication from the on-premises Jobserver to the Collibra Platform environment., this value must be true.
Example
-Dreversehttp.gateway=true
proxy.url The URL of the Collibra environment, followed by reversehttp-poll/<gateway-id>.
This "gateway-id" must be identical to the one used in the Name parameter when you add the Jobserver to the Collibra service.
The value of this setting is case-sensitive.
Example-Dproxy.url=https://<your-environment-url>/reversehttp-poll/Jobserver-1
target.url The URL of your the target system, either an on-premises Jobserver or a Tableau server.
Example- Jobserver:
-Dtarget.url=http://localhost:4404
- Tableau server:
-Dtarget.url=https://tableau-sales.yourcompany.com
(optional)
The hostname of the HTTP proxy server for outbound connections to your Collibra Platform environment.
This option is used to enable outbound traffic monitoring.
Example
-Dhttp.proxy.host=proxy.yourcompany.com
(optional)
The port of the HTTP proxy server for outbound connections to your Collibra Platform environment.
This option is used to enable outbound traffic monitoring.
Example
-Dhttp.proxy.port=8080
username
The username of any Collibra user for basic authentication.
Example
-Dusername=john.fisher
The corresponding password of the Collibra user for basic authentication.
Example
-Dpassword=ChangeMe
You can encrypt this password if necessary.
Example
-Dpassword=enc_2:t2rklBY6699aWV0...
keystore.path
The full path to the PKCS12 keystore. This keystore should contain the private key to sign the basic authentication header.
Example
-Dkeystore.path=/opt/collibra_data/spark-jobserver/security/jobserver-1-keystore.p12
The alias of the private key in the keystore. Each alias must be unique in your configuration.
If you used the
name
argument during the creation of the keystore, then use the value of thisname
argument.If only 1 keystore is created, the default alias is "1".
Example
-Dkeystore.alias=1
-Dkeystore.alias=MyJobserver
keystore.password
(optional)
The password to access the keystore. If the keystore is not password-protected, don't add it to the JVM settings.
You can encrypt this password if necessary.
Example
-Dkeystore.password=ChangeMe
keystore.key.password
(optional)
The password to use the private key, only applicable if you secured the private key with a password. If the key is not password-protected, don't add it to the JVM settings.
You can encrypt this password if necessary.
Example
-Dkeystore.key.password=ChangeMe
The time in milliseconds between a polling failure and a next polling attempt.
We recommend to not define this parameter, it then uses the default value of 5,000 milliseconds.
Example
-Dpolling.backoff=10000
max.connections.route The maximum number of HTTP connections per route.
We recommend to not define this parameter, it then uses the default value of 20.
Example
-Dmax.connections.route=30
max.connections.total The maximum number of all HTTP connections.
We recommend to not define this parameter, it then uses the default value of 40.
Example
-Dmax.connections.total=60
idle.connection.timeout The time in milliseconds that an idle connection is kept in the connection pool.
We recommend to not define this parameter, it then uses the default value of 5,000 milliseconds.
Example
-Didle.connection.timeout=3000
connection.timeout The time in milliseconds that the reverse HTTP server waits for a response from your Collibra environment or from the value in target.url.
If you don't set this parameter, the value is 60,000 milliseconds.
Example
-Dconnection.timeout=30000
connection.soTimeout The time in milliseconds that the reverse HTTP server waits for a response from your Collibra environment or from the value in target.url on socket level.
If you don't set this parameter, the value is 60,000 milliseconds.
Example
-Dconnection.soTimeout=30000
polling.timeout The time in milliseconds that the reverse HTTP server waits for a poll request from your Collibra environment to be submitted to the target.url.
If you don't set this parameter, the value is 300,000 milliseconds.
Example
-Dpolling.timeout=100000
polling.period The time in milliseconds that the reverse HTTP server waits in between poll request sessions. In other words, after having received a poll request or no request from your Collibra environment, the reverse HTTP server waits a certain amount of milliseconds before contacting the Collibra environment again.
If you don't set this parameter, the value is 100 milliseconds.
Example
-Dpolling.period=200
health.check.enabled(optional) Enables the health check mechanism between the Collibra environment and the reverse HTTP server.
If you don't set this parameter, the value is false.
Example
-Dhealth.check.enabled=true
health.check.period(optional) The time in milliseconds that the reverse HTTP server waits between health checks of its connection with Collibra.
If you don't set this parameter, the value is 5,000 milliseconds.
Example
-Dhealth.check.period=10000
health.check.timeout (optional) The time in milliseconds that the reverse HTTP server waits for a health check response from Collibra.
If you don't set this parameter, the value is 5,000 milliseconds.
Example
-Dhealth.check.timeout=10000
Note You have to use separate Jobservers for the ingestion of S3 or JDBC data sources and Tableau server data.
- Jobserver:
- Click the green Save all button.
- Click Security configuration.
- Click Edit configuration.
- Set the Authentication level to NONE.
Note This means that there is a one-way outbound communication over TLS from the Jobserver to the Collibra environment, note that there is no authentication at all.
- Click the green Save all button.
Add the Jobserver to the Collibra service
Execute the following steps in Collibra Console of your Collibra Platform environment.
- Open the DGC service settings for editing:
- Go to the Jobserver section of the configuration.
- Enter the required information.
Setting Description Name The name of the Jobserver as it will appear when you register a data source. The name is a freely chosen name but it is recommended to only use alphanumerical characters and dashes, for example Jobserver-1.
You will have to use this name as the ID of the gateway and in the address of this configuration.
Protocol The protocol for this configuration has to be HTTP and not the recommended HTTPS, this is because of the Collibra internal architecture.
The loopback address of the Collibra service, followed by /reversehttp/<gateway-id>.
The "gateway-id" must be identical to the one used in the Name parameter of this configuration.
Do not use the scheme in the address.
Example localhost:4400/reversehttp/Jobserver-1
Trusted server CA certificate
The certificate in PEM format that contains the public key of the Jobserver to validate the signature of the basic authentication header.
In the example to create a keystore, this is the content of the file cert.pem.
Example
-----BEGIN CERTIFICATE----- MIICqDCCAZACCQCcy3Oq51c5YzANBgkqhkiG9w0BAQsF
ADAWMRQwEgYDVQQDDAtq b2JzZXJ2ZXIt...
-----END CERTIFICATE-----Client certificate
This field is not used in this configuration.
Client private key
This field is not used in this configuration.
Note This field always shows dots, even if it is empty.
Table profiling data size
The approximate maximum disk size of the data in MB that will be used to profile a table. The value cannot exceed 10 000.
If you use a truststore, go to Generate keys, certificates and keystores.
- Click the green Save all button.
If all settings and communication paths are correctly configured, you will see a notice on the Jobserver:
INFO [I/O dispatcher 1] reversehttp.gateway.PollingController - proxy -> no requests polled (204)
What's next?
When you have set up this communication, you may want to monitor the outbound traffic. You can do so by enabling a man-in-the-middle proxy.