Synchronize Microsoft Fabric

Synchronizing Microsoft Fabric is the process of integrating metadata from Fabric and making the data available in Collibra Platform.

You can synchronize manually or automate the process by adding a synchronization schedule.

Prerequisites

In your Collibra environment:

Steps

  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. Click Integrations.
    The Integrations page opens.
  3. Click the Integration Configuration tab.
  4. In the Connection name column, locate the Azure connection that you used when you added the Fabric synchronization capability and click the capability link in the Capabilities column.
    The Fabric synchronization capability configuration page opens.
  5. In the Synchronization Configuration section, click the Edit icon.
  6. Complete the following fields:
    FieldAction
    Updated: <timestamp> (Optional)Click Updated: <timestamp> next to Synchronization Configuration, where timestamp indicates the last time when the data was loaded from Microsoft Fabric.
    The workspace names are loaded to the dropdown list of the Fabric workspace names field. This can take some time.
    Default community 

    In Default community, select a Collibra community to ingest the metadata. Subdomains per workspace will be automatically created in this community.

    Fabric domainIn Fabric domain, select a Collibra domain to ingest the Microsoft Tenant and Fabric Capacity assets.
    Custom tenant name

    Specify a tenant name to replace the tenant name fetched from the Microsoft API, or leave this field blank to use the one fetched from the API.

    Custom capacity name

    Specify a capacity name to replace the capacity name fetched from the Microsoft API, or leave this field blank to use the ones fetched from the API. You can add the names of multiple Fabric capacities.

    To add a custom capacity name:

    1. Click Add Item.
    2. In Capacity ID, enter the capacity ID of the Microsoft Fabric capacity.
    3. In Custom name, enter the name of the Microsoft Fabric capacity.
    Fabric workspace names

    Enter the names of specific Microsoft Fabric workspaces, or leave this field blank to ingest metadata from all workspaces available to your service principal.

    To specify the workspace, complete the following steps:

    1. Click Add Workspace.
    2. In Workspace, enter the name of the workspace you want to ingest metadata from.
    3. Click Save.
    Maximum files per lakehouseSpecify the maximum number of files to be ingested per lakehouse.
    • To ingest all files, set the value to -1.
    • If the value is set to 0, no files are ingested.
    JDBC connections

    If you want to allow sampling, profiling, and classification of assets created via the Fabric integration, add the JDBC connection information.

    To do so, complete the following steps:

    1. Click Add Item.
    2. In Database full name, enter the SQL Server database name in the following format:
      microsofttenant>fabriccapacity>fabricworkspace>databasename
      Example 

      Collibra, Inc>colfabriccapacity1>ColEngWorkspace>integrations-test-sql-database-8fef4727-8ea0-42a0-899b-4769219c105d

    3. In JDBC connection, select the JDBC connection that you created for your SQL Server database.
    4. Click Save.
    Note Make sure to add all JDBC connections for the SQL Server databases that you want to integrate.
    Domain include mappings

    Optionally, in Domain include mappings, specify the workspaces, warehouses, lakehouses, schemas, tables, or other artifacts from Fabric that you want to integrate. Optionally, also specify the Collibra domains where they need to be added. When include mappings are configured, only matched assets are ingested and everything else is skipped. When no include mappings are configured, all workspaces are ingested into auto-created subdomains under the main domain.

    Note that when you include a deeper asset, Collibra creates its parent assets as skeleton assets in their default, auto-created subdomains so the asset tree stays intact. For example, including a single table also creates the schema, parent lakehouse or warehouse, and workspace as skeleton assets in their default subdomains, and ingests the table's columns into the target domain.

    To limit the scope of metadata ingestion to specific domains in Collibra, add a domain include mapping:

    1. Click Add Domain include mapping.
    2. In Path, add the path to the assets in Fabric for which you want to integrate metadata. A path can be as granular as the following hierarchy: workspace > artifact > schema > table.
    3. Optionally, in Domain, select the Collibra domain in which you want to integrate the metadata.
    Example 
    • Include an entire workspace and all its artifacts: Sales Workspace to Sales General domain.
    • Include a lakehouse and its tables and columns: Sales Workspace > Customer Lakehouse to Sales Customer Data domain.
    • Include a Fabric database in a separate domain: Sales Workspace > Operations DB to Sales Operations domain.
    • Include a specific schema and its tables and columns: Sales Workspace > Customer Lakehouse > dbo to Customer DBO domain.
    • Include a single table and its columns: Sales Workspace > Customer Lakehouse > dbo > Customers to Customer Reporting domain.
    Domain exclude mappings

    Optionally, in Domain exclude mappings, add one or more mappings to prevent specific Fabric workspaces or artifacts from being ingested.

    Note The exclude mapping has priority over the include mapping.

    To exclude specific metadata from being ingested into Collibra, add a domain exclude mapping:

    1. Click Add Domain exclude mappings.
    2. In the field, add the path to the assets in Fabric that you want to exclude. A path can be as granular as the following hierarchy: workspace > artifact > schema > table.
    Example 
    • Exclude an entire non-production workspace: Dev Sandbox.
    • Exclude a single staging lakehouse while still ingesting the rest of the workspace: Sales Workspace > Staging Lakehouse.
    • Exclude a specific schema in a warehouse: Analytics Workspace > Finance Warehouse > raw_staging.
    • Exclude a single table: Analytics Workspace > Finance Warehouse > public > test_table.
    • Include a lakehouse but exclude one of its tables: include Sales Workspace > Customer Lakehouse, exclude Sales Workspace > Customer Lakehouse > dbo > scratch_table. The lakehouse is ingested without the excluded table.
  7. Click Save.
  8. Click Synchronize.
    A notification indicates the synchronization has started.
  1. On the main toolbar, click Products icon Catalog.
    The Catalog homepage opens.
  2. Click Integrations.
    The Integrations page opens.
  3. Click the Integration configuration tab.
  4. In the Connection name column, locate the Azure connection that you used when you added the Fabric synchronization capability and click the capability link in the Capabilities column.
    The synchronization page opens.
  5. In the Synchronization Configuration section, click the Edit icon.
  6. Complete the following fields:
    FieldAction
    Updated: <timestamp> (Optional)Click Updated: <timestamp> next to Synchronization Configuration, where timestamp indicates the last time when the data was loaded from Microsoft Fabric.
    The workspace names are loaded to the dropdown list of the Fabric workspace names field. This can take some time.
    Default community 

    In Default community, select a Collibra community to ingest the metadata. Subdomains per workspace will be automatically created in this community.

    Fabric domainIn Fabric domain, select a Collibra domain to ingest the Microsoft Tenant and Fabric Capacity assets.
    Custom tenant name

    Specify a tenant name to replace the tenant name fetched from the Microsoft API, or leave this field blank to use the one fetched from the API.

    Custom capacity name

    Specify a capacity name to replace the capacity name fetched from the Microsoft API, or leave this field blank to use the ones fetched from the API. You can add the names of multiple Fabric capacities.

    To add a custom capacity name:

    1. Click Add Item.
    2. In Capacity ID, enter the capacity ID of the Microsoft Fabric capacity.
    3. In Custom name, enter the name of the Microsoft Fabric capacity.
    Fabric workspace names

    Enter the names of specific Microsoft Fabric workspaces, or leave this field blank to ingest metadata from all workspaces available to your service principal.

    To specify the workspace, complete the following steps:

    1. Click Add Workspace.
    2. In Workspace, enter the name of the workspace you want to ingest metadata from.
    3. Click Save.
    Maximum files per lakehouseSpecify the maximum number of files to be ingested per lakehouse.
    • To ingest all files, set the value to -1.
    • If the value is set to 0, no files are ingested.
    JDBC connections

    If you want to allow sampling, profiling, and classification of assets created via the Fabric integration, add the JDBC connection information.

    To do so, complete the following steps:

    1. Click Add Item.
    2. In Database full name, enter the SQL Server database name in the following format:
      microsofttenant>fabriccapacity>fabricworkspace>databasename
      Example 

      Collibra, Inc>colfabriccapacity1>ColEngWorkspace>integrations-test-sql-database-8fef4727-8ea0-42a0-899b-4769219c105d

    3. In JDBC connection, select the JDBC connection that you created for your SQL Server database.
    4. Click Save.
    Note Make sure to add all JDBC connections for the SQL Server databases that you want to integrate.
    Domain include mappings

    Optionally, in Domain include mappings, specify the workspaces, warehouses, lakehouses, schemas, tables, or other artifacts from Fabric that you want to integrate. Optionally, also specify the Collibra domains where they need to be added. When include mappings are configured, only matched assets are ingested and everything else is skipped. When no include mappings are configured, all workspaces are ingested into auto-created subdomains under the main domain.

    Note that when you include a deeper asset, Collibra creates its parent assets as skeleton assets in their default, auto-created subdomains so the asset tree stays intact. For example, including a single table also creates the schema, parent lakehouse or warehouse, and workspace as skeleton assets in their default subdomains, and ingests the table's columns into the target domain.

    To limit the scope of metadata ingestion to specific domains in Collibra, add a domain include mapping:

    1. Click Add Domain include mapping.
    2. In Path, add the path to the assets in Fabric for which you want to integrate metadata. A path can be as granular as the following hierarchy: workspace > artifact > schema > table.
    3. Optionally, in Domain, select the Collibra domain in which you want to integrate the metadata.
    Example 
    • Include an entire workspace and all its artifacts: Sales Workspace to Sales General domain.
    • Include a lakehouse and its tables and columns: Sales Workspace > Customer Lakehouse to Sales Customer Data domain.
    • Include a Fabric database in a separate domain: Sales Workspace > Operations DB to Sales Operations domain.
    • Include a specific schema and its tables and columns: Sales Workspace > Customer Lakehouse > dbo to Customer DBO domain.
    • Include a single table and its columns: Sales Workspace > Customer Lakehouse > dbo > Customers to Customer Reporting domain.
    Domain exclude mappings

    Optionally, in Domain exclude mappings, add one or more mappings to prevent specific Fabric workspaces or artifacts from being ingested.

    Note The exclude mapping has priority over the include mapping.

    To exclude specific metadata from being ingested into Collibra, add a domain exclude mapping:

    1. Click Add Domain exclude mappings.
    2. In the field, add the path to the assets in Fabric that you want to exclude. A path can be as granular as the following hierarchy: workspace > artifact > schema > table.
    Example 
    • Exclude an entire non-production workspace: Dev Sandbox.
    • Exclude a single staging lakehouse while still ingesting the rest of the workspace: Sales Workspace > Staging Lakehouse.
    • Exclude a specific schema in a warehouse: Analytics Workspace > Finance Warehouse > raw_staging.
    • Exclude a single table: Analytics Workspace > Finance Warehouse > public > test_table.
    • Include a lakehouse but exclude one of its tables: include Sales Workspace > Customer Lakehouse, exclude Sales Workspace > Customer Lakehouse > dbo > scratch_table. The lakehouse is ingested without the excluded table.
  7. Click Save.
  8. Click the Add synchronization schedule icon.
  9. Enter the required information and click Save:
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.

What's next

The synchronization job synchronizes the Microsoft Fabric metadata.
After the synchronization: