Please see the set of transform project conventions for details on general project conventions, transform configuration, testing and IDE set up.
The License Select transform checks if the license
of input data is in approved/denied list. It is implemented as per the set of transform project conventions the following runtimes are available:
This filter scans the license column of an input dataset and appends the license_status
column to the dataset.
The type of the license column can be either string or list of strings. For strings, the license name is checked against the list of approved licenses. For list of strings, each license name in the list is checked against the list of approved licenses, and all must be approved.
If the license is approved, the license_status column contains True; otherwise False.
The set of dictionary keys holding license_select configuration for values are as follows:
The transform can be configured with the following key/value pairs from the configuration dictionary.
# Sample params dictionary passed to the transform
{
"license_select_params" : {
"license_column_name": "license",
"deny_licenses": False,
"licenses": [ 'MIT', 'Apache'],
"allow_no_license": False,
}
}
license_column_name - The name of the column with licenses.
deny_licenses - A boolean value, True for denied licesnes, False for approved licenses.
licenses - A list of licenses used as approve/deny list.
allow_no_license - A boolean value, used to retain the values with no license in the column license_column_name
The following command line arguments are available in addition to the options provided by the python launcher.
--lc_license_column_name
- set the name of the column holds license to process
--lc_allow_no_license
- allow entries with no associated license (default: false)
--lc_licenses_file
- S3 or local path to allowed/denied licenses JSON file
--lc_deny_licenses
- allow all licences except those in licenses_file (default: false)
-
The optional
lc_license_column_name
parameter is used to specify the column name in the input dataset that contains the license information. The default column name is license. -
The optional
lc_allow_no_license
option allows any records without a license to be accepted by the filter. If this option is not set, records without a license are rejected. -
The required
lc_licenses_file
options allows a list of licenses to be specified. An S3 or local file path should be supplied (including bucket name, for example: bucket-name/path/to/licenses.json) with the file contents being a JSON list of strings. For example:[ 'Apache-2.0', 'MIT' ]
-
The optional
lc_deny_licenses
flag is used whenlc_licenses_file
specifies the licenses that will be rejected, with all other licenses being accepted. These parameters do not affect handling of records with no license information, which is dictated by the allow_no_license option.
To run the samples, use the following make targets
run-cli-sample
run-local-python-sample
These targets will activate the virtual environment and set up any configuration needed. Use the -n option of make to see the detail of what is done to run the sample.
For example,
make run-cli-sample
... Then
ls output To see results of the transform.
To use the transform image to transform your data, please refer to the running images quickstart, substituting the name of this transform image and runtime as appropriate.