This repository contains the additional material for the article "Matrix Completion of World Trade" by F.Nutarelli, G. Gnecco and M. Riccaboni that you can find here.
In the first Section of this repository we introduce a step-by-step guide for the reader that is new to machine learning to guide her/him in understading Matrix Completion; in the second Section we provide further details on the main algorithm used for prediction tasks at different stages and on their implementation in Matlab.
This code provides an easy way to run robust matrix completion. Given a subset of observed entries of a matrix , Matrix Completion (MC) works by finding a suitable low-rank approximation (say, with rank ) of , by assuming the following model:
where , , whereas is a matrix of modeling errors. The rank- approximating matrix is found by solving a suitable optimization problem, following (Mazumder et al., 2010):
where is a training subset of pairs of indices corresponding to positions of known entries of the partially observed matrix , is the completed matrix (to be optimized), is a regularization constant (chosen by a suitable validation method), and is the nuclear norm of the matrix , i.e., the sum of all its singular values.
Eq. (1) can be re-written as
where, for a matrix , if , otherwise it is equal to 0. Here, represents the projection of onto the set of positions of observed entries of the matrix , and denotes the Frobenius norm of (i.e., the square root of the summation of squares of all its entries).
The MC optimization problem (2) can be solved by applying the Algorithm below, named Soft Impute in Mazumder et al. (2010) (compared to the original version, here we have included a maximal number of iterations , which can be helpful to reduce the computational effort when one has to run the algorithm multiple times, e.g., for several choices of the training set and of the regularization constant ):
In the above Algorithm, for a matrix represents the projection of onto the complement of , whereas
being
(with ) the singular value decomposition of , and
It is worth mentioning that a particularly efficient implementation of the operator is possible (by means of the MATLAB function svt.m, see Li and Zhou, (2017).
In the following mock example we will create a Toepliz square matrix, , and try to predict its entries through the matrix completion algorithm.
The algorithm composes of a main function, matrix_completion_nuclear
and a complementary function, svt
, by Li and Zhou, (2017). svt.m
should be in the same directory as matrix_completion_nuclear.m
clear
clc
%make sure you run this within the directory where svt.m is
cd /Users/federiconutarelli/Desktop/github_me/
A = toeplitz(randi([0 9],6,1));
B=1-isnan(A);
N = 10; %setting thee numbeer of iterations for matrix completion code
M=15; %choosing the ranges of lambda
lambda_tol_vector= zeros(M,1);
counter = 1;
for h=-M:0.5:M
lambda_tol_vector(counter)=2^(h);
counter = counter+1;
end
for k=1:size(lambda_tol_vector)
lambda_tol = lambda_tol_vector(k);
tol = 1e-9;
fprintf('Completion using nuclear norm regularization... \n');
[CompletedMat,objective,flag] = matrix_completion_nuclear(A.*B,B,N,lambda_tol,tol);
if flag==1
CompletedMat=zeros(size(A));
end
CompletedMatrix{k}=CompletedMat;
end
The results of the mock example are stored in CompletedMatrix{k}
. In parrticular, a value for each regularization parameter is provided as output. The "optimal" predicted matrix is the one associated with the "optimal" parameter. A simple elbow method can be adopted to select the latter.
A furrther example is proposed in esempio.m
.
There is proposed a facilitated case in which the matrix to be predicted is symmetric.
The user is requested to specify the number of simulations he/she wants the algorithm to perform.
Please, be sure that you run the code in the same folder where matrix_completion_nuclear
is stored.
If you use this MC algorithm in your research, please cite us as follows:
F.Nutarelli, G.Gnecco. MC algorithm: A matlab algorithm for easy Matrix Completion.
BibTex
@misc{econml,
author={F.Nutarelli, G.Gnecco},
title={{MC algorithm}: {A matlab algorithm for easy Matrix Completion.}},
howpublished={https://github.com/feedericonutarelli},
year={2021}
}
Li, C., & Zhou, H. (2017). Svt: Singular value thresholding in MATLAB. Journal of Statistical Software, 81(2), DOI:10.18637/jss.v081.c02.
Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11, pp. 2287–2322.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B(Methodological), 58(1), pp. 267—288.