alipy.query_strategy.query_features. QueryFeatureStability

This class implement the Active Matrix Completion using Committee Stability method in ICDM'13: Active Matrix Completion. This method use different rank values in SVD matrix completion to construct committee. The uncertainty of prediction of each missing entry was computed as the variance of the values from the committee members for that entry.

References

----------

[1] Shayok Chakraborty, Jiayu Zhou, Vineeth Balasubramanian, Sethuraman Panchanathan, Ian Davidson, and Jieping Ye. 2013. Active matrix completion. In IEEE International Conference on Data Mining. 81-90.

Methods

init

__init__(self, X, y, train_idx=None)

Parameters:	X: 2D array, optional (default=None) Feature matrix of the whole dataset. It is a reference which will not use additional memory. y: array-like, optional (default=None) Label matrix of the whole dataset. It is a reference which will not use additional memory. train_idx: array-like the index of training data. committee_rank: list, optional (default=None) The rank parameters used to contruct committee. Note that, each rank should be lower then min(X.shape)

Parameters:

X: 2D array, optional (default=None): Feature matrix of the whole dataset. It is a reference which will not use additional memory.
y: array-like, optional (default=None): Label matrix of the whole dataset. It is a reference which will not use additional memory.
train_idx: array-like: the index of training data.
committee_rank: list, optional (default=None): The rank parameters used to contruct committee.
Note that, each rank should be lower then min(X.shape)

select

select(self, observed_entries, unkonwn_entries, **kwargs)

Select a subset from the unlabeled set, return the selected instance and feature.

Parameters:	observed_entries: {list, np.ndarray, MultiLabelIndexCollection} The indexes of labeled samples. It should be a 1d array of indexes (column major, start from 0) or MultiLabelIndexCollection or a list of tuples with 2 elements, in which, the 1st element is the index of instance and the 2nd element is the index of features. unkonwn_entries: {list, np.ndarray, MultiLabelIndexCollection} The indexes of unlabeled samples. It should be a 1d array of indexes (column major, start from 0) or MultiLabelIndexCollection or a list of tuples with 2 elements, in which, the 1st element is the index of instance and the 2nd element is the index of features.
Returns:	selected_feature: list The selected features, it is a list of tuples. Note that, the index is for the the WHOLE dataset, NOT THE TRAINING SET.

Parameters:

observed_entries: {list, np.ndarray, MultiLabelIndexCollection}: The indexes of labeled samples. It should be a 1d array of indexes (column major, start from 0)
or MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of features.
unkonwn_entries: {list, np.ndarray, MultiLabelIndexCollection}: The indexes of unlabeled samples. It should be a 1d array of indexes (column major, start from 0)
or MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of features.

Returns:

selected_feature: list: The selected features, it is a list of tuples.
Note that, the index is for the the WHOLE dataset, NOT THE TRAINING SET.

select_by_mask

select_by_mask(self, observed_mask, **kwargs)

Select a subset from the unlabeled set by providing the mask matrix, return the selected instance and feature.

Parameters:	observed_mask: {list, np.ndarray} The mask matrix of training set. the matrix should have the shape [n_train_idx, n_features]. There must be only 1 and 0 in the matrix, in which, 1 means the corresponding element is known, otherwise, it will be cheated as an unknown element.
Returns:	selected_feature: list The selected features, it is a list of tuples. Note that, the index is for the given mask, NOT the whole dataset.