query_strategy.cost_sensitive. QueryCostSensitiveHALC

query_strategy.cost_sensitive.QueryCostSensitiveHALC(X=None, y=None, weights=None, label_tree=None)

HALC exploit the label hierarchies for cost-effective queries and will selects a batch of instance-label pairs with most information and least cost. Select some instance-label pairs based on the Informativeness for Hierarchical Labels

The definition of Informativeness for Hierarchical Labels is Infor(x,y)=I(y==1) Uanc + I(y==-1) Udec + Ux,y where x is sample,y is label.

References

[1] Yan Y, Huang S J. Cost-Effective Active Learning for Hierarchical Multi-Label Classification[C]//IJCAI. 2018: 2962-2968.

Methods

query_strategy.cost_sensitive.QueryCostSensitiveHALC. init

query_strategy.cost_sensitive.QueryCostSensitiveHALC.init(self, X=None, y=None, weights=None, label_tree=None)

Parameters:	X: 2D array, optional (default=None) Feature matrix of the whole dataset. It is a reference which will not use additional memory. shape [n_samples, n_features] y: 2D array, optional (default=None) Label matrix of the whole dataset. It is a reference which will not use additional memory. shape [n_samples, n_classes] weights: np.array, (default=None), shape [1, n_classes] or [n_classes] the weights of each class.if not provide,it will all be 1 label_tree: 2D array The hierarchical relationships among data features. if node_i is the parent of node_j , then label_tree(i,j)=1

Parameters:

X: 2D array, optional (default=None): Feature matrix of the whole dataset. It is a reference which will not use additional memory.
shape [n_samples, n_features]
y: 2D array, optional (default=None): Label matrix of the whole dataset. It is a reference which will not use additional memory.
shape [n_samples, n_classes]
weights: np.array, (default=None), shape [1, n_classes] or [n_classes]: the weights of each class.if not provide,it will all be 1
label_tree: 2D array: The hierarchical relationships among data features.
if node_i is the parent of node_j , then label_tree(i,j)=1

query_strategy.cost_sensitive.QueryCostSensitiveHALC. select

query_strategy.cost_sensitive.QueryCostSensitiveHALC.select(label_index, unlabel_index, oracle, cost, budget, models=None, base_model=None)

Selects a batch of instance-label pairs with most information and least cost.

Parameters:	label_index: {list, np.ndarray, MultiLabelIndexCollection} The indexes of labeled samples. It should be a 1d array of indexes (column major, start from 0) or MultiLabelIndexCollection or a list of tuples with 2 elements, in which, the 1st element is the index of instance and the 2nd element is the index of labels. unlabel_index: {list, np.ndarray, MultiLabelIndexCollection} The indexes of unlabeled samples. It should be a 1d array of indexes (column major, start from 0) or MultiLabelIndexCollection or a list of tuples with 2 elements, in which, the 1st element is the index of instance and the 2nd element is the index of labels. oracle: Oracle,(default=None) Oracle indicate the cost for each label. Oracle in active learning whose role is to label the given query.And it can also give the cost of each corresponding label.The Oracle includes the label and cost information at least. Oracle(labels=labels, cost=cost) costs: np.array, (default=None), shape [1, n_classes] or [n_classes] the costs of querying each class.if not provide,it will all be 1. budget: int, optional (default=40) The budget of the select cost.If cost for eatch labels is 1,will degenerate into the batch_size. models: object, optional (default=None) Current classification model, should have the 'predict_proba' method for probabilistic output. If not provided,it will build the model based the base_model. base_model: object, optional(default=None) The classification model for eatch label,if the models is not provided.It will build a classifi -cation model for the multilabel taks.If not provided, SVM with default parameters implemented by sklearn will be used.
Returns:	selected_ins_lab_pair: list A list of tuples that contains the indexes of selected instance-label pairs.

Parameters:

label_index: {list, np.ndarray, MultiLabelIndexCollection}: The indexes of labeled samples. It should be a 1d array of indexes (column major, start from 0) or
MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of labels.
unlabel_index: {list, np.ndarray, MultiLabelIndexCollection}: The indexes of unlabeled samples. It should be a 1d array of indexes (column major, start from 0) or
MultiLabelIndexCollection or a list of tuples with 2 elements, in which,
the 1st element is the index of instance and the 2nd element is the index of labels.
oracle: Oracle,(default=None): Oracle indicate the cost for each label.
Oracle in active learning whose role is to label the given query.And it can also give the cost of
each corresponding label.The Oracle includes the label and cost information at least.
Oracle(labels=labels, cost=cost)
costs: np.array, (default=None), shape [1, n_classes] or [n_classes]: the costs of querying each class.if not provide,it will all be 1.
budget: int, optional (default=40): The budget of the select cost.If cost for eatch labels is 1,will degenerate into the batch_size.
models: object, optional (default=None): Current classification model, should have the 'predict_proba' method for probabilistic output.
If not provided,it will build the model based the base_model.
base_model: object, optional(default=None): The classification model for eatch label,if the models is not provided.It will build a classifi
-cation model for the multilabel taks.If not provided, SVM with default parameters implemented
by sklearn will be used.

Returns:

selected_ins_lab_pair: list: A list of tuples that contains the indexes of selected instance-label pairs.