Matrix completion

Matrix completion is the task of filling in the missing entries of a partially observed matrix. Without any restrictions on the number of degrees of freedom in the completed matrix this problem is underdetermined since the hidden entries could be assigned arbitrary values. Thus matrix completion often seeks to find the lowest rank matrix or, if the rank of the completed matrix is known, a matrix of rank r that matches the known entries.

In the feature querying setting in alipy, some strategies need to evaluate the missing entries, and matrix completion is the most commonly used method. To implement these strategies, we first need to implement the matrix completion methods they use.

However, we find that these algorithms can be tool functions and used solely. Thus we encapsulate these functions as by-products in alipy. In the following, we will introduce the usage of these functions.

AFASMC KDD 2018

The AFASMC matrix completion method use the supervise information of instances with missing feature. Specifically, they train a model with the completed feature matrix X and their labels y, and minimize the reconstruction error on observed entries and the supervised loss on training data simultaneously.

To use this method, you need to pass the whole feature matrix and the labels, and also the indexes of observed data. The index is a MultiLabelIndexCollection object. Note that, we accept 3 types of raw data to construct this object, please see the instructions for more details.

Here is an example:

X = [
    [1.3, 1.3, 2.6, 0.0],
    [1.4, 1.1, 3.0, 0.1],
    [2.6, 2.3, 3.0, 0.2],
    [2.7, 2.1, 1.0, 1.0],
]
y = [1, 1, 0, 0]
observed_ind = [(0, 0), (0, 1), (0, 2), (1, 1), 
                (1, 3), (2, 0), (2, 1), (3, 0), (3, 3)]
from alipy.query_strategy.query_features import AFASMC_mc
X_filled = AFASMC_mc(X=X, y=y, omega=observed_ind)

And we can get the following output:

[[1.3        1.3        2.6        0.31178744]
 [0.8024306  1.1        0.49263161 0.1       ]
 [2.6        2.3        1.01954275 0.55496335]
 [2.7        1.50305516 0.82099905 1.        ]]

You can also provide a element mask matrix, the output is the same as above.

mask = [
    [1, 1, 1, 0],
    [0, 1, 0, 1],
    [1, 1, 0, 0],
    [1, 0, 0, 1]]
from alipy.query_strategy.query_features import AFASMC_mask_mc
AFASMC_mask_mc(X=X, y=y, mask=mask)

SVD matrix completion

Matrix completion by iterative low-rank SVD decomposition. Should be similar to SVDimpute from Missing value estimation methods for DNA microarrays by Troyanskaya et. al. .

Note that, this implementation iis refered to "fancyimpute" whose github address is: https://github.com/iskandr/fancyimpute .

Here is an example usage:

from alipy.query_strategy.query_features import IterativeSVD_mc
svd_mc = IterativeSVD_mc(rank=3)
X_filled = svd_mc.impute(X=X, observed_mask=mask)
print(X_filled)

And we can get the following output

[[1.3        1.3        2.6        0.17285503]
 [0.85051037 1.1        2.81357294 0.1       ]
 [2.6        2.3        3.17383057 0.31505765]
 [2.7        0.50224522 0.97285633 1.        ]]

Copyright © 2018, alipy developers (BSD 3 License).