Skip to content

This code provides an easy way to run robust matrix completion

Notifications You must be signed in to change notification settings

federiconuta/matrix-completion

Repository files navigation

Matrix Completion for prediction

This repository contains the additional material for the article "Matrix Completion of World Trade" by F.Nutarelli, G. Gnecco and M. Riccaboni that you can find here.

In the first Section of this repository we introduce a step-by-step guide for the reader that is new to machine learning to guide her/him in understading Matrix Completion; in the second Section we provide further details on the main algorithm used for prediction tasks at different stages and on their implementation in Matlab.

Matrix Completion: a theoretical resume

This code provides an easy way to run robust matrix completion. Given a subset of observed entries of a matrix , Matrix Completion (MC) works by finding a suitable low-rank approximation (say, with rank ) of , by assuming the following model:

where , , whereas is a matrix of modeling errors. The rank- approximating matrix is found by solving a suitable optimization problem, following (Mazumder et al., 2010):

where is a training subset of pairs of indices corresponding to positions of known entries of the partially observed matrix , is the completed matrix (to be optimized), is a regularization constant (chosen by a suitable validation method), and is the nuclear norm of the matrix , i.e., the sum of all its singular values.

The algorithm

Eq. (1) can be re-written as

where, for a matrix , if , otherwise it is equal to 0. Here, represents the projection of onto the set of positions of observed entries of the matrix , and denotes the Frobenius norm of (i.e., the square root of the summation of squares of all its entries).

The MC optimization problem (2) can be solved by applying the Algorithm below, named Soft Impute in Mazumder et al. (2010) (compared to the original version, here we have included a maximal number of iterations , which can be helpful to reduce the computational effort when one has to run the algorithm multiple times, e.g., for several choices of the training set and of the regularization constant ):

pseudocode

In the above Algorithm, for a matrix represents the projection of onto the complement of , whereas

being

(with ) the singular value decomposition of , and

with .

It is worth mentioning that a particularly efficient implementation of the operator is possible (by means of the MATLAB function svt.m, see Li and Zhou, (2017).

Usage Examples:

In the following mock example we will create a Toepliz square matrix, , and try to predict its entries through the matrix completion algorithm. The algorithm composes of a main function, matrix_completion_nuclear and a complementary function, svt, by Li and Zhou, (2017). svt.m should be in the same directory as matrix_completion_nuclear.m

clear 
clc

%make sure you run this within the directory where svt.m is
cd /Users/federiconutarelli/Desktop/github_me/

A = toeplitz(randi([0 9],6,1));
B=1-isnan(A);

N = 10; %setting thee numbeer of iterations for matrix completion code
M=15; %choosing the ranges of lambda
lambda_tol_vector= zeros(M,1);

counter = 1;
for h=-M:0.5:M
    lambda_tol_vector(counter)=2^(h);
    counter = counter+1;
end

for k=1:size(lambda_tol_vector)
    lambda_tol = lambda_tol_vector(k);
    tol = 1e-9;
    fprintf('Completion using nuclear norm regularization... \n');
    [CompletedMat,objective,flag] = matrix_completion_nuclear(A.*B,B,N,lambda_tol,tol);
    if flag==1
        CompletedMat=zeros(size(A));
    end
    
    CompletedMatrix{k}=CompletedMat;
end

The results of the mock example are stored in CompletedMatrix{k}. In parrticular, a value for each regularization parameter is provided as output. The "optimal" predicted matrix is the one associated with the "optimal" parameter. A simple elbow method can be adopted to select the latter.

A furrther example is proposed in esempio.m. There is proposed a facilitated case in which the matrix to be predicted is symmetric. The user is requested to specify the number of simulations he/she wants the algorithm to perform. Please, be sure that you run the code in the same folder where matrix_completion_nuclear is stored.

Citation

If you use this MC algorithm in your research, please cite us as follows:

F.Nutarelli, G.Gnecco. MC algorithm: A matlab algorithm for easy Matrix Completion.

BibTex

@misc{econml,
  author={F.Nutarelli, G.Gnecco},
  title={{MC algorithm}: {A matlab algorithm for easy Matrix Completion.}},
  howpublished={https://github.com/feedericonutarelli},
  year={2021}
}

References

Li, C., & Zhou, H. (2017). Svt: Singular value thresholding in MATLAB. Journal of Statistical Software, 81(2), DOI:10.18637/jss.v081.c02.

Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11, pp. 2287–2322.

Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B(Methodological), 58(1), pp. 267—288.

About

This code provides an easy way to run robust matrix completion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published