alipy.query_strategy.multi_label. QueryNoisyOraclesIEthresh

IEthresh will select a batch of oracles to label the selected instance. It will score for each oracle according to the difference between their labeling results and the majority vote results.

At each iteration, a batch of oracles whose scores are larger than a threshold will be selected. Oracle with a higher score is more likely to be selected.

References

----------

[1] Donmez P , Carbonell J G , Schneider J . Efficiently learning the accuracy of labeling sources for selective sampling.[C] ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2009.

Methods

init

__init__(self, X, y, oracles, initial_labeled_indexes, **kwargs)

Parameters:	X: 2D array, optional (default=None) Feature matrix of the whole dataset. It is a reference which will not use additional memory. y: array-like, optional (default=None) Label matrix of the whole dataset. It is a reference which will not use additional memory. oracles: {list, alipy.oracle.Oracles} An alipy.oracle.Oracle object that contains all the available oracles or a list of oracles. Each oracle should be a alipy.oracle.Oracle object. initial_labeled_indexes: {list, np.ndarray, IndexCollection} The indexes of initially labeled samples. Used for initializing the scores of each oracle. epsilon: float, optional (default=0.1) The value to determine how many oracles will be selected. S_t = {a\|UI(a) >= epsilon * max UI(a)}

Parameters:

X: 2D array, optional (default=None): Feature matrix of the whole dataset. It is a reference which will not use additional memory.
y: array-like, optional (default=None): Label matrix of the whole dataset. It is a reference which will not use additional memory.
oracles: {list, alipy.oracle.Oracles}: An alipy.oracle.Oracle object that contains all the
available oracles or a list of oracles.
Each oracle should be a alipy.oracle.Oracle object.
initial_labeled_indexes: {list, np.ndarray, IndexCollection}: The indexes of initially labeled samples. Used for initializing the scores of each oracle.
epsilon: float, optional (default=0.1): The value to determine how many oracles will be selected.
S_t = {a|UI(a) >= epsilon * max UI(a)}

select

select(self, label_index, unlabel_index, model=None, **kwargs)

Query from oracles. Return the index of selected instance and oracle.

Parameters:	label_index: {list, np.ndarray, IndexCollection} The indexes of labeled samples. unlabel_index: {list, np.ndarray, IndexCollection} The indexes of unlabeled samples. model: object, optional (default=None) Current classification model, should have the 'predict_proba' method for probabilistic output. If not provided, LogisticRegression with default parameters implemented by sklearn will be used. n_neighbors: int, optional (default=10) How many neighbors of the selected instance will be used to evaluate the oracles.
Returns:	selected_instance: int The index of selected instance. selected_oracle: int or str The index of selected oracle. If a list is given, the index of oracle will be returned. If a Oracles object is given, the oracle name will be returned.

Parameters:

label_index: {list, np.ndarray, IndexCollection}: The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}: The indexes of unlabeled samples.
model: object, optional (default=None): Current classification model, should have the 'predict_proba' method for probabilistic output.
If not provided, LogisticRegression with default parameters implemented by sklearn will be used.
n_neighbors: int, optional (default=10): How many neighbors of the selected instance will be used
to evaluate the oracles.

Returns:

selected_instance: int: The index of selected instance.
selected_oracle: int or str: The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.

select_by_prediction_mat

select_by_prediction_mat(self, label_index, unlabel_index, predict, **kwargs)

Query from oracles. Return the index of selected instance and oracle.

Parameters:	label_index: {list, np.ndarray, IndexCollection} The indexes of labeled samples. unlabel_index: {list, np.ndarray, IndexCollection} The indexes of unlabeled samples. predict: : 2d array, shape [n_samples, n_classes] The probabilistic prediction matrix for the unlabeled set. n_neighbors: int, optional (default=10) How many neighbors of the selected instance will be used to evaluate the oracles.
Returns:	selected_instance: int The index of selected instance. selected_oracle: int or str The index of selected oracle. If a list is given, the index of oracle will be returned. If a Oracles object is given, the oracle name will be returned.

Parameters:

label_index: {list, np.ndarray, IndexCollection}: The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}: The indexes of unlabeled samples.
predict: : 2d array, shape [n_samples, n_classes]: The probabilistic prediction matrix for the unlabeled set.
n_neighbors: int, optional (default=10): How many neighbors of the selected instance will be used
to evaluate the oracles.

Returns:

selected_instance: int: The index of selected instance.
selected_oracle: int or str: The index of selected oracle.
If a list is given, the index of oracle will be returned.
If a Oracles object is given, the oracle name will be returned.

select_by_given_instance

select_by_given_instance(self, selected_instance)

Select oracle to query by providing the index of selected instance.

Parameters:	selected_instance: int The indexes of selected samples. Should be a member of unlabeled set.
Returns:	selected_oracles: list The selected oracles for querying.