Create a crawler for Azure Data Lake Storage
By creating a crawler for Azure Data Lake Storage (ADLS), you can specify which directories you want to synchronize.
Before you begin
- You have registered an ADLS file system.
- You have connected the ADLS File System asset to the ADLS Edge capability.
Required permissions
- You have a resource role with the Configure external system resource permission, for example, Owner.
- You have a global role with the Catalog global permission, for example, Catalog Author.
- You have a global role with the View Edge connections and capabilities global permission, for example, Edge integration engineer.
Steps
- Open the ADLS File System asset.
-
In the tab pane, click
Configuration. - In the Crawlers section, click Create crawler.
The Create crawler dialog appears. - Enter the required information.
Field Description Domain
The domain in which the assets of the ADLS file system are to be created.
Name The name you want to give to the crawler in Collibra.
Include path The case-sensitive path to a directory of a directory in ADLS. All objects and subdirectories of this path are taken into account during the synchronization.
Use the following structure to refer to the path:https://<storage account name>.blob.core.windows.net/<container name>/<blob name>.Note The include path is case-sensitive.
Examplehttps://myaccount.blob.core.windows.net/mycontainer/myblobhttps://myaccount.blob.core.windows.net/$root/myblobrefers to the root container. For information on working with root containers, go to the ADLS documentation.Exclude patterns A case-sensitive pattern that represents the objects that are included via the Include path, but that you want to exclude from the synchronization.
When you define a pattern, you can use the following rules:*matches zero or more characters.**matches zero or more directories in a path.?matches one character.
Note- The exclude patterns are case-sensitive.
- The Exclude patterns apply only to files, not folders.
Examplecomm/*.jspmatches all .jsp files in the comm path.comm/t?st.jspmatches comm/test.jsp but also comm/tast.jsp or comm/txst.jsp.commm/**/test.jspmatches all test.jsp files in the comm path.org/framework/**/*.jspmatches all .jsp files in the org/framework path.org/**/servlet/test.jspmatches org/framework/servlet/test.jsp but also org/framework/testing/servlet/test.jsp and org/servlet/test.jsp.
Add pattern Button to add additional exclude patterns. Add path Button to add an additional Include path. - Click Create.
What's next?
You can now synchronize ADLS file system manually or define a synchronization schedule.