mapper-dnb

Overview

The dnb_mapper.py python script converts Dun & Bradstreet (DNB) files to json files ready to load into Senzing. This includes the following formats ...

Companies and their principles (CMPCVF) json format
Global contacts (GCA) tab delimited csv format
Ultimate beneficial owners (UBO) tab delinited csv format

Normally these are provided by DNB on request and placed on an FTP server for you to download.

Warning: the dnb_formats.json file contains the exact structure of these files. You may need to send these formats to DNB so they know exactly how to create them!

Loading DNB data into Senzing requires additional features and configurations. These are contained in the dnb_config_updates.json file.

Usage:

python3 dnb_mapper.py --help
usage: dnb_mapper.py [-h] [-f DNB_FORMAT] [-i INPUT_SPEC] [-o OUTPUT_PATH]
                     [-l LOG_FILE]

optional arguments:
  -h, --help            show this help message and exit
  -f DNB_FORMAT, --dnb_format DNB_FORMAT
                        choose CMPCVF, UBO, or GCA
  -i INPUT_SPEC, --input_spec INPUT_SPEC
                        the name of one or more DNB files to map (place in
                        quotes if you use wild cards)
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        output directory or file name for mapped json records
  -l LOG_FILE, --log_file LOG_FILE
                        optional statistics filename (json format).

This will step you through the process of adding the data sources, entity types, features, attributes and other settings needed to load this watch list data into Senzing. After each command you will see a status message saying "success" or "already exists". For instance, if you run the script twice, the second time through they will all say "already exists" which is OK.

Running the mapper

First, download the DNB files you want to load from the DNB FTP site. Since the data files are so large, these are normally split into multiple files.

Second, run the mapper. Example usage:

python3 dnb_mapper.py -f CMPCVF -i "./input/CMPCVF*.txt" -o ./output -l cmpcvf_stats.json

python3 dnb_mapper.py -f GCA -i "./input/GCA*.txt" -o ./output -l gca_stats.json

python3 dnb_mapper.py -f UBO -i "./input/UBO*.txt" -o ./output -l ubo_stats.json

The output file defaults to the same name and location as the input file and a .json extension is added.

It is critical that the -f file format match the input files exactly!

Loading into Senzing

If you use the G2Loader program to load your data, its best to list the mapped json files you want to load in a project file. There is an example of one in your senzing instalation here: /opt/senzing/g2/python/demo/sample/project.csv. Then from from the /opt/senzing/g2/python directory ...

python3 G2Loader.py -p <name of project file>

If you use the API directly, then you just need to perform an addRecord() for each line of each mapped file.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
.gitignore		.gitignore
.project		.project
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
dnb_config_updates.json		dnb_config_updates.json
dnb_formats.json		dnb_formats.json
dnb_mapper.py		dnb_mapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mapper-dnb

Overview

Contents

Prerequisites

Installation

Configuring Senzing

Running the mapper

Loading into Senzing

About

Releases

Packages

Languages

License

Vetted/mapper-dnb

Folders and files

Latest commit

History

Repository files navigation

mapper-dnb

Overview

Contents

Prerequisites

Installation

Configuring Senzing

Running the mapper

Loading into Senzing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages