alipy.query_strategy.query_labels. QueryInstanceBMDR

Discriminative and Representative Queries for Batch Mode Active Learning (BMDR) will query a batch of informative and representative examples by minimizing the ERM risk bound of active learning.

This method needs to solve a quadratic programming problem for multiple times at one query which is time consuming in the relative large dataset (e.g., more than thousands of unlabeled examples).

Note that, the solving speed is also influenced by kernel function. In our testing, the gaussian kernel takes more time to solve the problem. The QP solver is cvxpy here. The model used for instances selection is a linear regression model with the kernel form.

References

----------

[1] Wang, Z., and Ye, J. 2013. Querying discriminative and representative samples for batch mode active learning. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 158-166.

Methods

init

__init__(self, X, y, beta=1000, gamma=0.1, rho=1, **kwargs)
Parameters:
X: 2D array, optional (default=None)
Feature matrix of the whole dataset. It is a reference which will not use additional memory.
y: array-like, optional (default=None)
Label matrix of the whole dataset. It is a reference which will not use additional memory.
beta: float, optional (default=1000)
The MMD parameter.
gamma: float, optional (default=0.1)
The l2-norm regularizer parameter.
rho: float, optional (default=1)
The parameter used in ADMM.
kernel : {'linear', 'poly', 'rbf', callable}, optional (default='rbf')
Specifies the kernel type to be used in the algorithm.
It must be one of 'linear', 'poly', 'rbf', or a callable.
If a callable is given it is used to pre-compute the kernel matrix
from data matrices; that matrix should be an array of shape
``(n_samples, n_samples)``.
degree : int, optional (default=3)
Degree of the polynomial kernel function ('poly').
Ignored by all other kernels.
gamma_ker : float, optional (default=1.)
Kernel coefficient for 'rbf', 'poly'.
coef0 : float, optional (default=1.)
Independent term in kernel function.
It is only significant in 'poly'.

select

select(self, label_index, unlabel_index, batch_size=5, qp_solver='ECOS', **kwargs)

Select indexes from the unlabel_index for querying.

Parameters:
label_index: {list, np.ndarray, IndexCollection}
The indexes of labeled samples.
unlabel_index: {list, np.ndarray, IndexCollection}
The indexes of unlabeled samples.
batch_size: int, optional (default=1)
Selection batch size.
qp_solver: str, optional (default='ECOS')
The solver in cvxpy to solve QP, must be one of
['ECOS', 'OSQP']
ECOS: https://www.embotech.com/ECOS
OSQP: https://osqp.org/
Returns:
selected_idx: list
The selected indexes which is a subset of unlabel_index.

Copyright © 2018, alipy developers (BSD 3 License).