recordLinkage

DSC 180B Project: Probabilistic Record Linkage

The machine learning pipeline can be broken down in 3 steps:

graph construction
node2vec embedding
classifier.

To run the program, refer to the config folder for the parameters that can be changed for this program, and use the python run.py command to train, test, and evualate the model.Refer to the run.py file for the appropriate command line inputs.

For visualizations and testing can be found in the notebooks directory.

The paper associated with this paper can be found here

Example Code

In order to generate the artificial dataset, you can use the command below:

python3 gen-data

In order to perform the graph construction, use the command:

python3 create-small-graphs

Lastly, in order to test any changes to the pipeline on example test, use the command:

python3 test-project

Configuration

refer to ./config/datasetGenConfig.json to change the hyperparameters associated with the artificial dataset generation.
refer to ./config/test_config.json and ./config/train_config.json to change the hyperparamters associated with generating the graphs and training the classifier.

(add information about the configurations)

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
backlog		backlog
config		config
extra		extra
notebooks		notebooks
proposal		proposal
reports		reports
src		src
test		test
.gitignore		.gitignore
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

recordLinkage

Example Code

Configuration

About

Releases

Packages

Contributors 2

Languages

UdaikaranSingh/recordLinkage

Folders and files

Latest commit

History

Repository files navigation

recordLinkage

Example Code

Configuration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages