alipy.query_strategy.query_labels. QueryInstanceLAL

The key idea of LAL is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state.

The regressor is trained on 2D datasets and can score unseen data from real datasets. The method yields strategies that work well on real data from a wide range of domains.

In alipy, LAL will use a pre-extracted data provided by the authors to train the regressor. It will download the data file if no accepted file is found. You can also download 'LAL-iterativetree-simulatedunbalanced-big.npz' and 'LAL-randomtree-simulatedunbalanced-big.npz' from https://github.com/ksenia-konyushkova/LAL. and specify the dir to the file for training.

The implementation is refer to the https://github.com/ksenia-konyushkova/LAL/ directly.

References

----------

[1] Ksenia Konyushkova, and Sznitman Raphael. 2017. Learning Active Learning from Data. In The 31st Conference on Neural Information Processing Systems (NIPS 2017), 4228-4238.

Methods

init

__init__(self, X, y, mode='LAL_iterative', data_path='.', cls_est=50, train_slt=True, **kwargs)
Parameters:
X: 2D array, optional (default=None)
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
y: array-like, optional (default=None)
Label matrix of the whole dataset. It is a reference which will not use additional memory.
mode: str, optional (default='LAL_iterative')
The mode of data sampling. must be one of 'LAL_iterative', 'LAL_independent'.
data_path: str, optional (default='.')
Path to store the data file for training.
The path should be a dir, and the file name should be
'LAL-iterativetree-simulatedunbalanced-big.npz' or 'LAL-randomtree-simulatedunbalanced-big.npz'.
If no accepted files are detected, it will download the pre-extracted data file to the given path.
cls_est: int, optional (default=50)
The number of estimator used for training the random forest whose role
is calculating the features for selector.
train_slt: bool, optional (default=True)
Whether to train a selector in initializing.

download_data

download_data(self)

Download the training data for training the regressor to evaluate unlabeled data.

train_selector_from_file

train_selector_from_file(self, file_path=None, reg_est=2000, reg_depth=40, feat=6)

Train a random forest as the instance selector. Note that, if the parameters of the forest is too high to your computer, it will take a lot of time for training.

Parameters:
file_path: str, optional (default=None)
The path to the specific data file.
reg_est: int, optional (default=2000)
The number of estimators of the forest.
reg_depth: int, optional (default=40)
The depth of the forest.
feat: int, optional (default=6)
The feat of the forest.

select

select(self, label_index, unlabel_index, batch_size=1, **kwargs)

Select indexes from the unlabel_index for querying.

Parameters:
label_index: {list, np.ndarray, IndexCollection}
The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}
The indexes of unlabeled samples.
batch_size: int, optional (default=1)
Selection batch size.
Returns:
selected_idx: list
The selected indexes which is a subset of unlabel_index.

Copyright © 2018, alipy developers (BSD 3 License).