query_strategy.query_labels.
QueryInstanceUncertainty
QueryInstanceUncertainty(X=None, y=None, measure='entropy')
Uncertainty query strategy.
The implement of uncertainty measure includes:
1. margin sampling
2. least confident
3. entropy
The above measures need the probabilistic output of the model.
There are 3 ways to select instances in the data set.
1. use select if you are using sklearn model.
2. use the default logistic regression model to choose the instances
by passing None to the model parameter.
3. use select_by_prediction_mat by providing the probabilistic prediction
matrix of your own model, shape [n_samples, n_classes].
Methods
init
init(self, X=None, y=None, measure='entropy')
Parameters:
|
-
X: 2D array, optional (default=None)
-
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
-
y: array-like, optional (default=None)
-
Label matrix of the whole dataset. It is a reference which will not use additional memory.
-
measure: str, optional (default='entropy')
-
measurement to calculate uncertainty, should be one of
['least_confident', 'margin', 'entropy', 'distance_to_boundary']
--'least_confident' x* = argmax 1-P(y_hat|x) ,where y_hat = argmax P(yi|x)
--'margin' x* = argmax P(y_hat1|x) - P(y_hat2|x), where y_hat1 and y_hat2 are the first and second
most probable class labels under the model, respectively.
--'entropy' x* = argmax -sum(P(yi|x)logP(yi|x))
--'distance_to_boundary' Only available in binary classification, x* = argmin |f(x)|,
your model should have 'decision_function' method which will return a 1d array.
|
select
select(self, label_index, unlabel_index, model=None, batch_size=1)
Select indexes from the unlabel_index for querying.
Parameters:
|
-
label_index: {list, np.ndarray, IndexCollection}
-
The indexes of labeled samples.
-
unlabel_index: {list, np.ndarray, IndexCollection}
-
The indexes of unlabeled samples.
-
model: object, optional (default=None)
-
Current classification model, should have the 'predict_proba' method for probabilistic output.
If not provided, LogisticRegression with default parameters implemented by sklearn will be used.
-
batch_size: int, optional (default=1)
-
Selection batch size.
|
Returns:
|
-
selected_idx: list
-
The selected indexes which is a subset of unlabel_index.
|
select_by_prediction_mat
select_by_prediction_mat(self, unlabel_index, predict, batch_size=1)
Select indexes from the unlabel_index for querying.
Parameters:
|
-
unlabel_index: {list, np.ndarray, IndexCollection}
-
The indexes of unlabeled samples. Should be one-to-one
correspondence to the prediction matrix.
-
predict: 2d array, shape [n_samples, n_classes] or [n_samples]
-
The probabilistic prediction matrix for the unlabeled set.
-
batch_size: int, optional (default=1)
-
Selection batch size.
|
Returns:
|
-
selected_idx: list
-
The selected indexes which is a subset of unlabel_index.
|
calc_entropy
@classmethod
calc_entropy(cls, predict_proba)
Calc the entropy for each instance.
Parameters:
|
-
predict_proba: array-like, shape [n_samples, n_class]
-
Probability prediction for each instance.
|
Returns:
|
-
entropy: list
-
1d array, entropy for each instance.
|