alipy.query_strategy.query_features. QueryFeatureAFASMC

This class implement the KDD'18: Active Feature Acquisition with Supervised Matrix Completion (AFASMC) method. It will complete the matrix with supervised information first. And select the missing feature with the highest variance based on the results of previous completion.

References

----------

[1] Active feature acquisition with supervised matrix completion. Sheng-Jun Huang, Miao Xu, Ming-Kun Xie, Masashi Sugiyama, Gang Niu and Songcan Chen In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'18), 2018.

Methods

init

__init__(self, X, y, train_idx=None)
Parameters:
X: 2D array, optional (default=None)
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
y: array-like, optional (default=None)
Label matrix of the whole dataset. It is a reference which will not use additional memory.
train_idx: array-like
the index of training data.

select

select(self, observed_entries, unkonwn_entries, **kwargs)

Select a subset from the unlabeled set, return the selected instance and feature.

Parameters:
observed_entries: {list, np.ndarray, MultiLabelIndexCollection}
The indexes of labeled samples. It should be a 1d array of indexes (column major, start from 0)
or MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of features.
unkonwn_entries: {list, np.ndarray, MultiLabelIndexCollection}
The indexes of unlabeled samples. It should be a 1d array of indexes (column major, start from 0)
or MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of features.
Returns:
selected_feature: list
The selected features, it is a list of tuples.
Note that, the index is for the the WHOLE dataset, NOT THE TRAINING SET.

select_by_mask

select_by_mask(self, observed_mask, **kwargs)

Select a subset from the unlabeled set by providing the mask matrix, return the selected instance and feature.

Parameters:
observed_mask: {list, np.ndarray}
The mask matrix of training set. the matrix should have the shape [n_train_idx, n_features].
There must be only 1 and 0 in the matrix, in which, 1 means the corresponding element is known,
otherwise, it will be cheated as an unknown element.
Returns:
selected_feature: list
The selected features, it is a list of tuples.
Note that, the index is for the given mask, NOT the whole dataset.

Copyright © 2018, alipy developers (BSD 3 License).