alipy.experiment.
AlExperiment
AlExperiment is a class to encapsulate various tools
and implement the main loop of active learning.
AlExperiment is used when query-type is 'AllLabels'.
Only support the most commonly used scenario: query label of an instance
To run the experiment with only one class,
we have to impose some restrictions to make
sure the robustness of the code:
1. Your model object should accord scikit-learn api
2. If a custom query strategy is given, you should implement
the BaseQueryStrategy api. Additional parameters should be static.
3. The data split should be given if you are comparing multiple methods.
You may also generate new split with split_AL()
Methods
alipy.experiment.al_experiment.
init()
alipy.experiment.al_experiment.init(self, X, y, model=LogisticRegression(), performance_metric='accuracy_score',
stopping_criteria=None, stopping_value=None, batch_size=1, **kwargs)
Parameters:
|
-
X,y : array
-
The data matrix
-
model: object
-
An model object which accord the scikit-learn api
-
performance_metric: str, optional (default='accuracy_score')
-
The performance metric
-
stopping_criteria: str, optional (default=None)
-
stopping criteria, must be one of: [None, 'num_of_queries', 'cost_limit', 'percent_of_unlabel', 'time_limit']
-
None: stop when no unlabeled samples available
'num_of_queries': stop when preset number of quiries is reached
'cost_limit': stop when cost reaches the limit.
'percent_of_unlabel': stop when specific percentage of unlabeled data pool is labeled.
'time_limit': stop when CPU time reaches the limit.
-
stopping_value: {int, float}, optional (default=None)
-
The value of the corresponding stopping criterion.
-
batch_size: int, optional (default=1)
-
batch size of AL
-
train_idx: array-like, optional (default=None)
-
index of training set, shape like [n_split_count, n_training_indexes]
-
test_idx: array-like, optional (default=None)
-
index of testing set, shape like [n_split_count, n_testing_indexes]
-
label_idx: array-like, optional (default=None)
-
index of labeling set, shape like [n_split_count, n_labeling_indexes]
-
unlabel_idx: array-like, optional (default=None)
-
index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]
|
alipy.experiment.al_experiment.
set_query_strategy
set_query_strategy(self, strategy="QueryInstanceUncertainty", **kwargs)
Set the query strategy of the experiment.
Parameters:
|
-
performace_metric: str
-
The query performance-metric function.
Giving str to use a pre-defined performance-metric.
-
kwargs: dict, optional
-
The args used in performance-metric.
if kwargs is None,the pre-defined performance will init in the default way.
(See the default way of pre-defined query strategy in the alipy/metric/'performance').
Note that, each parameters should be static.
|
alipy.experiment.al_experiment.
set_performance_metric
alipy.experiment.al_experiment.set_performance_metric(self, performance_metric='accuracy_score', **kwargs)
Set the metric for experiment.
Parameters:
|
-
performace_metric: str
-
The query performance-metric function.
Giving str to use a pre-defined performance-metric.
-
kwargs: dict, optional
-
The args used in performance-metric.
if kwargs is None,the pre-defined performance will init in the default way.
(See the default way of pre-defined query strategy in the alipy/metric/'performance').
Note that, each parameters should be static.
|
alipy.experiment.al_experiment.
set_data_split
alipy.experiment.al_experiment.set_data_split(self, train_idx, test_idx, label_idx, unlabel_idx)
set the data split indexes by user input the specific parameters.
Parameters:
|
-
train_idx: array-like, optional (default=None)
-
index of training set, shape like [n_split_count, n_training_indexes]
-
test_idx: array-like, optional (default=None)
-
index of testing set, shape like [n_split_count, n_testing_indexes]
-
label_idx: array-like, optional (default=None)
-
index of labeling set, shape like [n_split_count, n_labeling_indexes]
-
unlabel_idx: array-like, optional (default=None)
-
index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]
|
alipy.experiment.al_experiment.
split_AL
alipy.experiment.al_experiment.split_AL(self, test_ratio=0.3, initial_label_rate=0.05,
split_count=10, all_class=True)
split dataset for active learning experiment.
Parameters:
|
-
test_ratio: float, optional (default=0.3)
-
ratio of test set
-
initial_label_rate: float, optional (default=0.05)
-
ratio of initial label set or the existed features (missing rate = 1-initial_label_rate)
e.g. initial_labelset*(1-test_ratio)*n_samples
-
split_count: int, optional (default=10)
-
random split data _split_count times
-
all_class: bool, optional (default=True)
-
whether each split will contain at least one instance for each class.
If False, a totally random split will be performed.
|
Returns:
|
-
train_idx: list
-
index of training set, shape like [n_split_count, n_training_indexes]
-
test_idx: list
-
index of testing set, shape like [n_split_count, n_testing_indexes]
-
label_idx: list
-
index of labeling set, shape like [n_split_count, n_labeling_indexes]
-
unlabel_idx: list
-
index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]
|
alipy.experiment.al_experiment.
start_query
alipy.experiment.al_experiment.start_query(self, multi_thread=True, **kwargs)
Start the active learning main loop
If using implemented query strategy, It will run in multi-thread default.
Parameters:
|
-
multi_thread: bool, optional (default=True)
-
Decide whether to run in multi-thread.
if multi thread is True,it will run in multi-thread way.
if not,it will be executed sequentially.
-
kwargs: set, optional
-
The parameters will use in aceThreading init or stateio init.
if the kwargs is None,it will init in the default way.
-
if multi_thread is True,the kwargs will be used for aceThreading init,
aceThreading(A class implement multi-threading in active learning for multiple
random splits experiments.)
and you can see the specific parameter settings in alipy/utils/'multi_thread.py' init().
-
if not,the kwargs will be used for stateio init,
stateio(A class to store states.)
and you can see the specific parameter settings in alipy/experiment/'state_io.py' init().
-
Note that, each parameters should be static.
|
alipy.experiment.al_experiment.
get_experiment_result
alipy.experiment.al_experiment.get_experiment_result(self, title=None)
Print the experiment result,and draw a line chart.
Parameters:
|
-
title: str
-
the title of the line chart.
|