An implementation of NOMAD with UPC++

This repository presents an implementation of Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion (NOMAD) in C++ with UPC++. The primary ideas are extracted from this paper

Yun, Hyokun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. V. N. Vishwanathan and Inderjit S. Dhillon. “NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion.” Proc. VLDB Endow. 7 (2014): 975-986. https://arxiv.org/abs/1312.0193

UPC++ Installation

UPC++ is a parallel programming library for developing C++ applications with the Partitioned Global Address Space (PGAS) model. UPC++ has three main objectives:

Provide an object-oriented PGAS programming model in the context of the popular C++ language
Expose useful asynchronous parallel programming idioms unavailable in traditional SPMD models, such as remote function invocation and continuation-based operation completion, to support complex scientific applications
Offer an easy on-ramp to PGAS programming through interoperability with other existing parallel programming systems (e.g., MPI, OpenMP, CUDA)

You can setup UPC++ as following the instruction at here

Sparse Matrix Generation

You can generate a random sparse matrix of integers with an assumption that there is at least one non-zero value in each column and each row:

$ g++ -o gen_sparse_mat data/generate_sparse_matrix.cpp
$ ./gen_sparse_mat [NROWS] [NCOLS]

Example: The below command will generate a sparse matrix of integers with 100 rows and 700 columns

$ g++ -o gen_sparse_mat data/generate_sparse_matrix.cpp
$ ./gen_sparse_mat 100 700

NOMAD Execution

You can optionally modify the source code and build the source with UPC++ as simple commands as follow:

$ upcxx -O -o NOMAD-UPC main.cpp worker.cpp

To run this solution, you must specify the number of processes NUM_PROC, the input file for sparse matrix INPUT_FILE and the number of epochs you need to run NUM_EPOCHS

$ upcxx-run -n [NUM_PROC] NOMAD-UPC [INPUT_FILE] [NUM_EPOCHS]

For example: If you want to execute this implementation with 5 processes, the input matrix is store in matrix.txt, and the epoch of running is 5000, then the command should be:

$ upcxx-run -n 5 NOMAD-UPC matrix.txt 5000

The result will be stored in an output text file named: out_[INPUT_FILE]

NOMAD with MovieLens-100K

MovieLens 100K movie ratings. Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. I added an evaluation for Movielen-100K dataset. Training NOMAD with MovieLens on training set X (for X in [1, 2, 3, 4, 5, 'a', 'b']) is performed with following command:

$ upcxx-run -n 5 NOMAD-UPC movielen-100k-data/sparse_u[X].base [NUM_EPOCHS]

To evaluate the RMSE of training set of set X, we execute a command:

$ ./evaluation movielen-100k-data/out_sparse_u[X].base movielen-100k-data/sparse_u[X].base

To evaluate the RMSE of testing set of set X, we execute a command:

$ ./evaluation movielen-100k-data/out_sparse_u[X].base movielen-100k-data/sparse_u[X].test

Notice

There are some slight differences in this implementation as compared to the original idea in the paper:

I change the update function (9) and (10) into
Instead of transfer a pair of , I store all matrix in the global memory and I only transfer the index of corresponding rows of
I also implemented the mechanism of dynamic load balancing which was mentioned in the paper

Todos

Plug-in mmap file reading in C++ for big file reading
Visualize the procedure of resources transfer and allocation

License

Free for you, Easy to use

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An implementation of NOMAD with UPC++

UPC++ Installation

Sparse Matrix Generation

NOMAD Execution

NOMAD with MovieLens-100K

Notice

Todos

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
movielen-100k-data		movielen-100k-data
random_sparse_matrix		random_sparse_matrix
NOMAD-UPC		NOMAD-UPC
README.md		README.md
eval.cpp		eval.cpp
evaluation		evaluation
gen_sparse_mat		gen_sparse_mat
main.cpp		main.cpp
worker.cpp		worker.cpp
worker.h		worker.h

hungnphan/UPC-NOMAD

Folders and files

Latest commit

History

Repository files navigation

An implementation of NOMAD with UPC++

UPC++ Installation

Sparse Matrix Generation

NOMAD Execution

NOMAD with MovieLens-100K

Notice

Todos

License

About

Resources

Stars

Watchers

Forks

Languages