alipy.query_strategy.query_features.
QueryFeatureAFASMC
This class implement the KDD'18: Active Feature Acquisition with
Supervised Matrix Completion (AFASMC) method. It will complete the
matrix with supervised information first. And select the missing feature
with the highest variance based on the results of previous completion.
References
----------
[1] Active feature acquisition with supervised matrix completion.
Sheng-Jun Huang, Miao Xu, Ming-Kun Xie, Masashi Sugiyama, Gang Niu and Songcan Chen
In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'18), 2018.
Methods
init
__init__(self, X, y, train_idx=None)
Parameters:
|
-
X: 2D array, optional (default=None)
-
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
-
y: array-like, optional (default=None)
-
Label matrix of the whole dataset. It is a reference which will not use additional memory.
-
train_idx: array-like
-
the index of training data.
|
select
select(self, observed_entries, unkonwn_entries, **kwargs)
Select a subset from the unlabeled set, return the selected instance and feature.
Parameters:
|
-
observed_entries: {list, np.ndarray, MultiLabelIndexCollection}
-
The indexes of labeled samples. It should be a 1d array of indexes (column major, start from 0)
or MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of features.
-
unkonwn_entries: {list, np.ndarray, MultiLabelIndexCollection}
-
The indexes of unlabeled samples. It should be a 1d array of indexes (column major, start from 0)
or MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of features.
|
Returns:
|
-
selected_feature: list
-
The selected features, it is a list of tuples.
Note that, the index is for the the WHOLE dataset, NOT THE TRAINING SET.
|
select_by_mask
select_by_mask(self, observed_mask, **kwargs)
Select a subset from the unlabeled set by providing the mask matrix,
return the selected instance and feature.
Parameters:
|
-
observed_mask: {list, np.ndarray}
-
The mask matrix of training set. the matrix should have the shape [n_train_idx, n_features].
There must be only 1 and 0 in the matrix, in which, 1 means the corresponding element is known,
otherwise, it will be cheated as an unknown element.
|
Returns:
|
-
selected_feature: list
-
The selected features, it is a list of tuples.
Note that, the index is for the given mask, NOT the whole dataset.
|