Dremio Cloner is a python-based utility for Dremio Enterprise. It supports the following commands: get, put, cascade-acl, report-acl, report-reflections.
Dremio Cloner can be utilized for:
- Migrating entire Dremio environments, for example, from community edition to enterprise edition
- CI/CD processes
- Disaster Recovery scenarios
- Partial backup/restore
- Security Audit reporting
- Reflection reporting
Dremio Cloner is executed with the following command:
python dremio_cloner.py [config_file.json]`
Dremio Migration Tool helps to migrate spaces and folders to new paths. This is done by reading in a Dremio Cloner Export, modifying it and writing it into a new directory or file. It also rewrites and reformats the SQL queries, probably SQL comments can be lost. The output file or directory can be read by Dremio Cloner and written into a destination system.
The Migration Tool is executed with the following command:
python dremio_migration.py [config_migration_file.json]
Dremio Cloner requires Python 3 and requires some additional Python libraries, please install:
$ pip install mo-sql-parsing requests
If you are using Dremio Migration Tool, you additionally need to install sqlparse
:
$ pip install sqlparse
Older versions of Dremio Cloner used the Python package moz-sql-parser
, which is now deprecated and got replaced by mo-sql-parsing
.
If you ran an older version of Dremio Cloner before, you need to uninstall these packages before installing mo-sql-parsing
.
$ pip list
...
mo-dots 4.22.21108
mo-future 3.147.20327
mo-imports 3.149.20327
mo-kwargs 4.22.21108
mo-logs 4.23.21108
moz-sql-parser 4.40.21126
$ pip uninstall -y moz-sql-parser mo-dots mo-future mo-imports mo-kwargs mo-logs
$ pip install mo-sql-parsing requests
Command "get" selectively saves definitions for objects such as Source, Space, Folder, PDS, VDS, ACLs, Reflections, Queues, Rules, Tags, Wikis, and Votes from a Dremio environment into a JSON file.
The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.
- "command":"get"
- "source": defines source Dremio Environment with
- "endpoint"
- "username"
- "password"
- "verify_ssl"
- "is_community_edition"
- "graph_api_support"
- "is_dremio_cloud"
- "dremio_cloud_org_id"
- "dremio_cloud_project_id"
- "target": defines an output filename or a directory with
- "filename"
- "directory"
- "overwrite"
- "options":
- logging options
- "logging.level"
- "logging.format"
- "logging.filename"
- "logging.verbose"
- miscellaneous options
- "max_errors"
- "http_timeout"
- scope of Space processing
- "space.process_mode"
- "folder.process_mode"
- "space.filter"
- "space.filter.names"
- "space.exclude.filter"
- "space.folder.filter"
- "space.folder.filter.paths"
- "space.folder.exclude.filter"
- scope of Source processing
- "source.process_mode"
- "source.filter"
- "source.filter.names"
- "source.filter.types"
- "source.exclude.filter"
- "source.folder.filter"
- "source.folder.filter.paths"
- "source.folder.exclude.filter"
- scope of PDS processing
- "pds.process_mode"
- "pds.filter"
- "pds.filter.names"
- "pds.exclude.filter
- "pds.list.useapi"
- scope of VDS processing
- "vds.process_mode"
- "vds.filter"
- "vds.filter.names"
- "vds.exclude.filter"
- "vds.dependencies.process_mode"
- scope of Reflection processing
- "reflection.process_mode"
- "reflection.id_include_list"
- "reflection.only_for_matching_vds"
- scope of Workload Management processing
- "wlm.queue.process_mode"
- "wlm.rule.process_mode"
- scope of processing other objects
- "user.process_mode"
- "group.process_mode"
- "wiki.process_mode"
- "tag.process_mode"
- "home.process_mode"
- "vote.process_mode"
- logging options
Please see a sample JSON configuration file in the config folder of this repository.
Command "put" selectively updates an existing Dremio Environment from a JSON file previously generated by "get" command.
Command "put" can also process ACL transformation. For example, it can transform ACLs to use LDAP_GROUP_PROD instead of LDAP_GROUP_DEV.
Command "put" can also process Source transformations. For example it can transform paths and references in objects to use SOURCE_PROD instead of SOURCE_DEV. PLEASE NOTE: Use of the source transformation feature is against best practices. As a best practice it is recommended that sources are named the same in all environments. In addition, for the source transformation to succeed as expected you must ensure that no VDS, PDS, Column or Folder in the system contains the same name (nor will it contain an exact substring match) as the original\source data source name.
The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.
- "command":"put"
- "source": defines an output filename with
- "filename"
- "directory"
- "target": defines target Dremio Environment with
- "endpoint"
- "username"
- "password"
- "verify_ssl"
- "is_community_edition"
- "is_dremio_cloud"
- "dremio_cloud_org_id"
- "dremio_cloud_project_id"
- "options":
- logging options
- "logging.level"
- "logging.format"
- "logging.filename
- "logging.verbose"
- miscellaneous options
- "max_errors"
- "http_timeout"
- "source.retry_timedout"
- "dry_run"
- processing of User and Group objects missing in the target environemnt
- "space.ignore_missing_acl_user"
- "space.ignore_missing_acl_group"
- "folder.ignore_missing_acl_user"
- "folder.ignore_missing_acl_group"
- "source.ignore_missing_acl_user"
- "source.ignore_missing_acl_group"
- "pds.ignore_missing_acl_user"
- "pds.ignore_missing_acl_group"
- "vds.ignore_missing_acl_user"
- "vds.ignore_missing_acl_group"
- scope of Space processing
- "space.process_mode"
- "space.filter"
- "space.filter.names"
- "space.exclude.filter"
- "folder.process_mode"
- "space.folder.filter"
- "space.folder.filter.paths"
- "space.folder.exclude.filter"
- scope of Source processing
- "source.process_mode"
- "source.filter"
- "source.filter.names"
- "source.filter.types"
- "source.exclude.filter"
- "source.folder.filter"
- "source.folder.filter.paths"
- "source.folder.exclude.filter"
- scope of PDS processing
- "pds.process_mode"
- "pds.filter"
- "pds.filter.names"
- "pds.exclude.filter
- "pds.list.useapi"
- scope of VDS processing
- "vds.process_mode"
- "vds.filter"
- "vds.filter.names"
- "vds.exclude.filter"
- "vds.max_hierarchy_depth"
- scope of Reflection processing
- "reflection.process_mode"
- "pds.reflection_refresh_mode"
- "reflection.id_include_list"
- scope of processing other objects
- "user.process_mode"
- "group.process_mode"
- "wiki.process_mode"
- "tag.process_mode"
- "home.process_mode"
- "vote.process_mode"
- acl transformation processing
- "transformation"
- "acl"
- "file"
- "acl"
- "transformation"
- source transformation processing
- "transformation"
- "source"
- "file"
- "source"
- "transformation"
- logging options
Please see a sample JSON configuration file in the config folder of this repository.
Command "cascade-acl" selectively propagates ACLs in an object hierarchy.
The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.
- "command":"cascade-acl"
- "target": defines Dremio Environment to be processed with
- "endpoint"
- "username"
- "password"
- "verify_ssl"
- "options":
- logging options
- "logging.level"
- "logging.format"
- "logging.filename
- "logging.verbose"
- miscellaneous options
- "max_errors"
- "http_timeout"
- "source.retry_timedout"
- "dry_run"
- scope of Space processing
- "space.filter"
- "space.exclude.filter"
- "space.cascade-acl-origin.override-object"
- "space.folder.filter"
- "space.folder.exclude.filter"
- "space.folder.cascade-acl-origin.filter"
- scope of Source processing
- "source.filter"
- "source.exclude.filter"
- "source.cascade-acl-origin.override-object"
- "source.folder.filter"
- "source.folder.exclude.filter"
- scope of PDS processing
- "pds.filter"
- "pds.exclude.filter
- "pds.list.useapi"
- scope of VDS processing
- "vds.filter"
- "vds.exclude.filter"
- logging options
Note, if none of space.cascade-acl-origin.override-object, space.folder.cascade-acl-origin.filter, and source.cascade-acl-origin.override-object specified:
- each Space ACL will be propagated through its hierarchy and applied to Folders and VDSs as per filter configuration
- To cascade ACLs for all spaces, specify
{"space.filter": "*"}
- To omit cascading any space ACLs, specify
{"space.filter": ""}
- To cascade ACLs for a specific named space, specify
{"space.filter": "spacename"}
wherespacename
should be replaced with the actual name of the space
- To cascade ACLs for all spaces, specify
- each Source ACL will be propagated through its hierarchy and applied to PDSs as per filter configuration
- To cascade ACLs for all sources, specify
{"source.filter": "*"}
- To omit cascading any source ACLs, specify
{"source.filter": ""}
- To cascade ACLs for a specific named source, specify
{"source.filter": "sourcename"}
wheresourcename
should be replaced with the actual name of the source
- To cascade ACLs for all sources, specify
Please see a sample JSON configuration file in the config folder of this repository.
Command "report-acl" produces a selective security report on all objects with ACL in a Dremio environment.
The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.
- "command":"report-acl"
- "source": defines Dremio Environment with
- "endpoint"
- "username"
- "password"
- "verify_ssl"
- "is_rbac_version"
- "target": defines an output filename with
- "filename"
- "options":
- logging options
- "logging.level"
- "logging.format"
- "logging.filename
- "logging.verbose"
- miscellaneous options
- "max_errors"
- "http_timeout"
- "source.retry_timedout"
- report format
- "report.csv.delimiter"
- "report.csv.newline"
- scope of Space processing
- "space.filter"
- "space.exclude.filter"
- "space.folder.filter"
- "space.folder.exclude.filter"
- scope of Source processing
- "source.filter"
- "source.exclude.filter"
- "source.folder.filter"
- "source.folder.exclude.filter"
- scope of PDS processing
- "pds.filter"
- "pds.exclude.filter
- "pds.list.useapi"
- scope of VDS processing
- "vds.filter"
- "vds.exclude.filter"
- logging options
Please see a sample JSON configuration file in the config folder of this repository.
Command "report-reflections" produces a reflection report with reflection usage information and ranking on potentially duplicate reflections.
The command is configured with a JSON file with configuration attributes listed below. For detailed description of the configuration JSON attributes, see Reference section below in Appendix 1.
- "command":"report-acl"
- "source": defines Dremio Environment with
- "endpoint"
- "username"
- "password"
- "verify_ssl"
- "target": defines an output filename with
- "filename"
- "options":
- logging options
- "logging.level"
- "logging.format"
- "logging.filename
- "logging.verbose"
- miscellaneous options
- "max_errors"
- "http_timeout"
- "source.retry_timedout"
- report format
- "report.csv.delimiter"
- "report.csv.newline"
- logging options
Note, that this command does not provide any option for Scope definition. Please see a sample JSON configuration file in the config folder of this repository.
Configuration Option | Description |
---|---|
endpoint | Defines Dremio API endpoint. For example, http://localhost:9047/. Mandatory attribute. |
username | Dremio user name. Must be an Admin. Mandatory attribute. |
password | Dremio user password. Optional field. If not provided, CLI will request password at runtime. |
verify_ssl | If set to False, Dremio Cloner will not validate SSL certificate of the Dremio Environment. Default is True. |
is_community_edition | Set to True if reading Dremio CE. Writing to Dremio CE is not supported. |
graph_api_support | Dremio Graph API is only available in EE starting version 4.0.0. Default value is False. |
is_rbac_version | Set to True if the version of Dremio EE supports the RBAC privileges model. Default value is False. |
is_dremio_cloud | Set to True if reading from or writing to Dremio Cloud. Default value is False. |
dremio_cloud_org_id | Dremio Cloud Organization ID to connect to. |
dremio_cloud_project_id | Dremio Cloud Project ID to connect to. |
Configuration Option | Description |
---|---|
filename | Defines a JSON filename to be used as either source of information for put command or target for saving data for get command. The JSON file will encapsulate entire information on a Dremio environment. Either filename or directory must be defined. |
directory | Similar to filename above. However, a folder structure, identical to Dremio environment will be created and the information on a Dremio objects will be stored in sepearate files within this folder structure. This option allows for use cases with indivudal processing of Dremio objects by external tools, such as github. |
overwrite | Allows to overwrite existing JSON file or directory. |
Configuration Option | Description |
---|---|
logging.level | Defines logging level: DEBUG, INFO, WARN, ERROR |
logging.format | Logging format. For example: "%(levelname)s:%(asctime)s:%(message)s" |
logging.filename | Filename for logging. File will be appended if exists. If this option is omitted, standard output will be used for logging. |
logging.verbose | Default False. Produce verbose logging such as log entire entity definitions if set to True. |
Configuration Option | Description |
---|---|
max_errors | Defines a number of errors at which processing will be terminated. |
http_timeout | Timeout for each API call. This parameter might become important in certain situations when Sources defined in Dremio are not available. |
dry_run | Defines a Dremio Cloner execution that will not update a target Dremio environment. In conjunction with logging.level set to WARN allows to execute Dremio Cloner without an impact on the target environment and check the log file for all activities that would have been submitted to the target Dremio Environment. Respective log entries will include dry_run keyword. |
vds._max_hierarchy_depth | Defines maximum level of VDS hierarchy supported by Dremio Cloner. It's a guard rail with default value of 100. |
Configuration Option | Description |
---|---|
space.filter | A filter that defines what Spaces will be included into processing. "*" will include all Spaces. Empty field will exclude all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with space.exclude.filter. |
space.filter.names | If specified, a list filter that defines what Spaces will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"space.filter.names": []}, ) then the "get" or "put" command will include all spaces specified by space.filter, which is the default behavior. Works in logical AND with space.exclude.filter. Example: {"space.filter.names": ["MySpace1", "MySpace2", "MySpace3"]}, |
space.exclude.filter | A filter that defines what Spaces will be excluded into processing. "*" will exclude all Spaces. Empty field will include all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with space.filter. |
space.folder.filter | A filter that defines what Space Folders will be included into processing. "*" will include all Folders. Empty field will exclude all Folders. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with space.folder.exclude.filter. |
space.folder.filter.paths | If specified, a list filter that defines what Space Folder paths will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"space.folder.filter.paths": []}, ) then the "get" or "put" command will include all space folders specified by space.folder.filter, which is the default behavior. Works in logical AND with space.folder.exclude.filter. Example: {"space.folder.filter.paths": ["folder1/folder2", "Staging"]}, |
space.folder.exclude.filter | A filter that defines what Space Folders will be excluded into processing. "*" will exclude all Folders. Empty field will include all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with space.folder.filter. |
Configuration Option | Description |
---|---|
source.filter | A filter that defines what Sources will be included into processing. "*" will include all Sources. Empty field will exclude all Sources. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.exclude.filter. |
source.filter.names | If specified, a list filter that defines what Sources will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"source.filter.names": []}, ) then the "get" or "put" command will include all sources specified by source.filter, which is the default behavior. Works in logical AND with source.exclude.filter. Example: {"source.filter.names": ["MySource1", "MySource2", "MySource3"]}, |
source.filter.types | If specified, a list filter that defines what Source Types will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty then the "get" or "put" command will include all possible source types present based on the other source filters, which is the default behavior. Works in logical AND with the other source filters. Example: {"source.filter.types": ["S3", "POSTGRES", "NAS"]}, |
source.exclude.filter | A filter that defines what Spaces will be excluded into processing. "*" will exclude all Spaces. Empty field will include all Sources. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.filter. |
source.folder.filter | A filter that defines what Source Folders will be included into processing. "*" will include all Folders. Empty field will exclude all Folders. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.exclude.filter. |
source.folder.filter.paths | If specified, a list filter that defines what Source Folder paths will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"source.folder.filter.paths": []}, ) then the "get" or "put" command will include all source folders specified by source.folder.filter, which is the default behavior. Works in logical AND with source.folder.exclude.filter. Example: {"source.folder.filter.paths": ["folder1/folder2", "default"]}, |
source.folder.exclude.filter | A filter that defines what Source Folders will be excluded into processing. "*" will exclude all Folders. Empty field will include all Spaces. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with source.filter. |
Configuration Option | Description |
---|---|
pds.filter | A filter that defines what PDSs will be included into processing. "*" will include all PDSs. Empty field will exclude all PDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with pds.exclude.filter. |
pds.filter.names | If specified, a list filter that defines what PDSs will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"pds.filter.names": []}, ) then the "get" or "put" command will include all PDSs specified by pds.filter, which is the default behavior. Works in logical AND with pds.exclude.filter. Example: {"pds.filter.names": ["MyPDS1", "MyPDS2", "MyPDS3"]}, |
pds.exclude.filter | A filter that defines what PDSs will be excluded into processing. "*" will exclude all PDSs. Empty field will include all PDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with pds.filter. |
pds.list.useapi | Forces to use API for collecting list of PDSs if set to True. Default value is False which means that INFOMRATION_SCHEMA will be utilized instead of API. False is a recommended value. |
Configuration Option | Description |
---|---|
vds.filter | A filter that defines what VDSs will be included into processing. "*" will include all VDSs. Empty field will exclude all VDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with vds.exclude.filter. |
vds.filter.names | If specified, a list filter that defines what VDSs will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty (e.g. {"vds.filter.names": []}, ) then the "get" or "put" command will include all VDSs specified by vds.filter, which is the default behavior. Works in logical AND with vds.exclude.filter. Example: {"vds.filter.names": ["MyVDS1", "MyVDS2", "MyVDS3"]}, |
vds.exclude.filter | A filter that defines what VDSs will be excluded into processing. "*" will exclude all VDSs. Empty field will include all VDSs. Star may be used multiple times in the filter to define a pattern. Folders must be separated with backslash. Works in logical AND with vds.filter. |
Configuration Option | Description |
---|---|
reflection.id_include_list | If specified, a list filter that defines what reflection ids will be included into processing during "get" or "put" command execution. If this option is not specified or the list is empty then the "get" or "put" command will include all reflections, which is the default behavior. During "get" command execution this list refers to ids of reflections in the source Dremio environment, which are visible in sys.reflections . During "put" command execution this list refers to ids of reflections that were previously exported out of a source Dremio environment and present in the source file(s) being fed into the "put" command. Example: {"reflection.id_include_list": ["dc86ab2e-8ebf-4d69-9302-911875a79e74", "ad3444df-7da5-4ea5-9624-b7705f351914"]} |
Configuration Option | Description |
---|---|
user.process_mode group.process_mode | Determines if users will be created in the target Dremio Environment if they are referenced in the source JSON file but not in the target environment. Applicable for "put" command only. However, user creation is not possible with the current Dremio API. This parameter can only have a single value skip. |
space.ignore_missing_acl_user space.ignore_missing_acl_group folder.ignore_missing_acl_user folder.ignore_missing_acl_group source.ignore_missing_acl_user source.ignore_missing_acl_group pds.ignore_missing_acl_user pds.ignore_missing_acl_group vds.ignore_missing_acl_user vds.ignore_missing_acl_group | These configuration parameters define if Dremio Cloner ignores a situation when a user or a group is defined in an ACL in the source JSON file but is not present in the target Dremio Environment. This situation is a potential security risk as an ACL may be created with no limitations in the target environment when all referenced users and groups cannot be found. Default value is False. |
Configuration Option | Description |
---|---|
space.process_mode folder.process_mode source.process_mode pds.process_mode vds.process_mode reflection.process_mode pds.reflection_refresh_mode wlm.queue.process_mode wlm.rule.process_mode wiki.process_mode tag.process_mode home.process_mode vote.process_mode | Defines whether Dremio Cloner will 1) insert new objects only or 2) update existing objects only or 3) do an upsert. These parameters can be set to: skip, create_only, update_only, create_overwrite, process. process is only aplicable for "get" command. skip will prevent any changes to the target Dremio Environment for the specified object type. Note, pds.process_mode can only take skip and promote with promote updating PDS ACL as required. |
vds.dependencies.process_mode | Possible values: ignore, get. Default ignore. If set to get, Dremio Cloner will collect information on all decencies throughout the object hierarchy (VDS and PDS) required for each VDS that satisfies VDS filter criteria. |
Configuration Option | Description |
---|---|
space.cascade-acl-origin.override-object | If specified, overrides default behavior for Space hierarchy and an ACL of the object specified in this parameter will be used through all Spaces all hierarchies instead of the respective Spaces' ACLs. A valid example is this: {"space.filter": "spacetest"}, {"space.cascade-acl-origin.override-object": "spacetest/spacetest_folder"}, which is interpreted as read the ACLs from the object called spacetest/spacetest_folder and apply those ACLs to each object under the space called spacetest. |
source.cascade-acl-origin.override-object | If specified, overrides default behavior for Source hierarchy and an ACL of the object specified in this parameter will be used through all Source all hierarchies instead of the respective Sources' ACLs. |
space.folder.cascade-acl-origin.filter | If specified, overrides default behavior for Space hierarchy and an ACLs of the Folders selected by this will be used through its Folder hierarchy instead of the respective Source's ACL. A valid example is this: {"space.filter": "spacetest"}, {"space.cascade-acl-origin.override-object": "spacetest/spacetest_folder"}, {"space.folder.cascade-acl-origin.filter": "another_folder"}, which can be interpreted as all objects under spacetest will get the ACLs that are defined in spacetest/spacetest_folder, EXCEPT for those in spacetest/another_folder. All objects beneath another_folder (whose full path is spacetest/another_folder in this example) will have their ACLs set to whatever the ACLs are on another_folder. |
Configuration Option | Description |
---|---|
transformation | If specified, allows for transformation during "put" command execution. Supported transformations are ACL and Source transformation. Transformation rules are specified in a separate json file and the file is referenced in the main comnfiguration file. For example: {"transformation": {"acl": {"file": "acl_transformation.json"}}} for ACL transformations and {"transformation": {"source": {"file": "source_transformation.json"}}} for Source transformations |
Configuration Option | Description |
---|---|
report.csv.delimiter | A field delimiter used to generate a report. |
report.csv.newline | A new line delimiter used to generate a report. |