QueryInstanceQBC(X=None, y=None, method='query_by_bagging', disagreement='vote_entropy')
The Query-By-Committee (QBC) algorithm.
QBC minimizes the version space, which is the set of hypotheses that are consistent with the current labeled training data.
This class implement the query-by-bagging method. Which uses the bagging in sklearn to construct the committee. So your model should be a sklearn model. If not, you may using the default logistic regression model by passing None model.
There are 3 ways to select instances in the data set.
1. use select if you are using sklearn model.
2. use the default logistic regression model to choose the instances by passing None to the model parameter.
3. use select_by_prediction_mat by providing the prediction matrix for each committee. Each committee predict matrix should have the shape [n_samples, n_classes] for probabilistic output or [n_samples] for class output.
References
----------
[1] H.S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the ACM Workshop on Computational Learning Theory, pages 287-294, 1992.
[2] N. Abe and H. Mamitsuka. Query learning strategies using boosting and bagging. In Proceedings of the International Conference on Machine Learning (ICML), pages 1\–9. Morgan Kaufmann, 1998.
init(self, X=None, y=None, method='query_by_bagging', disagreement='vote_entropy')
Parameters: |
|
---|
select(self, label_index, unlabel_index, model=None, batch_size=1, n_jobs=None)
Select indexes from the unlabel_index for querying.
Parameters: |
|
---|---|
Returns: |
|
select_by_prediction_mat(self, unlabel_index, predict, batch_size=1)
Select indexes from the unlabel_index for querying.
Parameters: |
|
---|---|
Returns: |
|
calc_vote_entropy(cls, predict_matrices)
Calculate the vote entropy for measuring the level of disagreement in QBC.
[1] I. Dagan and S. Engelson. Committee-based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning (ICML), pages 150–157. Morgan Kaufmann, 1995.
Parameters: |
|
---|---|
Returns: |
|
calc_avg_KL_divergence(cls, predict_matrices)
Calculate the average Kullback-Leibler (KL) divergence for measuring the level of disagreement in QBC.
[1] A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In Proceedings of the International Conference on Machine Learning (ICML), pages 359–367. Morgan Kaufmann, 1998.
Parameters: |
|
---|---|
Returns: |
|
Copyright © 2018, alipy developers (BSD 3 License).