alipy.experiment. AlExperiment

AlExperiment is a class to encapsulate various tools and implement the main loop of active learning. AlExperiment is used when query-type is 'AllLabels'. Only support the most commonly used scenario: query label of an instance

To run the experiment with only one class, we have to impose some restrictions to make sure the robustness of the code:

1. Your model object should accord scikit-learn api

2. If a custom query strategy is given, you should implement the BaseQueryStrategy api. Additional parameters should be static.

3. The data split should be given if you are comparing multiple methods. You may also generate new split with split_AL()

Methods

alipy.experiment.al_experiment. init()

alipy.experiment.al_experiment.init(self, X, y, model=LogisticRegression(), performance_metric='accuracy_score',
                                    stopping_criteria=None, stopping_value=None, batch_size=1, **kwargs)

Parameters:	X,y : array The data matrix model: object An model object which accord the scikit-learn api performance_metric: str, optional (default='accuracy_score') The performance metric stopping_criteria: str, optional (default=None) stopping criteria, must be one of: [None, 'num_of_queries', 'cost_limit', 'percent_of_unlabel', 'time_limit'] None: stop when no unlabeled samples available 'num_of_queries': stop when preset number of quiries is reached 'cost_limit': stop when cost reaches the limit. 'percent_of_unlabel': stop when specific percentage of unlabeled data pool is labeled. 'time_limit': stop when CPU time reaches the limit. stopping_value: {int, float}, optional (default=None) The value of the corresponding stopping criterion. batch_size: int, optional (default=1) batch size of AL train_idx: array-like, optional (default=None) index of training set, shape like [n_split_count, n_training_indexes] test_idx: array-like, optional (default=None) index of testing set, shape like [n_split_count, n_testing_indexes] label_idx: array-like, optional (default=None) index of labeling set, shape like [n_split_count, n_labeling_indexes] unlabel_idx: array-like, optional (default=None) index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]

Parameters:

X,y : array: The data matrix
model: object: An model object which accord the scikit-learn api
performance_metric: str, optional (default='accuracy_score'): The performance metric
stopping_criteria: str, optional (default=None): stopping criteria, must be one of: [None, 'num_of_queries', 'cost_limit', 'percent_of_unlabel', 'time_limit']; None: stop when no unlabeled samples available
'num_of_queries': stop when preset number of quiries is reached
'cost_limit': stop when cost reaches the limit.
'percent_of_unlabel': stop when specific percentage of unlabeled data pool is labeled.
'time_limit': stop when CPU time reaches the limit.
stopping_value: {int, float}, optional (default=None): The value of the corresponding stopping criterion.
batch_size: int, optional (default=1): batch size of AL
train_idx: array-like, optional (default=None): index of training set, shape like [n_split_count, n_training_indexes]
test_idx: array-like, optional (default=None): index of testing set, shape like [n_split_count, n_testing_indexes]
label_idx: array-like, optional (default=None): index of labeling set, shape like [n_split_count, n_labeling_indexes]
unlabel_idx: array-like, optional (default=None): index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]

alipy.experiment.al_experiment. set_query_strategy

set_query_strategy(self, strategy="QueryInstanceUncertainty", **kwargs)

Set the query strategy of the experiment.

Parameters:	performace_metric: str The query performance-metric function. Giving str to use a pre-defined performance-metric. kwargs: dict, optional The args used in performance-metric. if kwargs is None,the pre-defined performance will init in the default way. (See the default way of pre-defined query strategy in the alipy/metric/'performance'). Note that, each parameters should be static.

alipy.experiment.al_experiment. set_performance_metric

alipy.experiment.al_experiment.set_performance_metric(self, performance_metric='accuracy_score', **kwargs)

Set the metric for experiment.

Parameters:	performace_metric: str The query performance-metric function. Giving str to use a pre-defined performance-metric. kwargs: dict, optional The args used in performance-metric. if kwargs is None,the pre-defined performance will init in the default way. (See the default way of pre-defined query strategy in the alipy/metric/'performance'). Note that, each parameters should be static.

alipy.experiment.al_experiment. set_data_split

alipy.experiment.al_experiment.set_data_split(self, train_idx, test_idx, label_idx, unlabel_idx)

set the data split indexes by user input the specific parameters.

Parameters:	train_idx: array-like, optional (default=None) index of training set, shape like [n_split_count, n_training_indexes] test_idx: array-like, optional (default=None) index of testing set, shape like [n_split_count, n_testing_indexes] label_idx: array-like, optional (default=None) index of labeling set, shape like [n_split_count, n_labeling_indexes] unlabel_idx: array-like, optional (default=None) index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]

Parameters:

train_idx: array-like, optional (default=None): index of training set, shape like [n_split_count, n_training_indexes]
test_idx: array-like, optional (default=None): index of testing set, shape like [n_split_count, n_testing_indexes]
label_idx: array-like, optional (default=None): index of labeling set, shape like [n_split_count, n_labeling_indexes]
unlabel_idx: array-like, optional (default=None): index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]

alipy.experiment.al_experiment. split_AL

alipy.experiment.al_experiment.split_AL(self, test_ratio=0.3, initial_label_rate=0.05,
            split_count=10, all_class=True)

split dataset for active learning experiment.

Parameters:	test_ratio: float, optional (default=0.3) ratio of test set initial_label_rate: float, optional (default=0.05) ratio of initial label set or the existed features (missing rate = 1-initial_label_rate) e.g. initial_labelset(1-test_ratio)n_samples split_count: int, optional (default=10) random split data _split_count times all_class: bool, optional (default=True) whether each split will contain at least one instance for each class. If False, a totally random split will be performed.
Returns:	train_idx: list index of training set, shape like [n_split_count, n_training_indexes] test_idx: list index of testing set, shape like [n_split_count, n_testing_indexes] label_idx: list index of labeling set, shape like [n_split_count, n_labeling_indexes] unlabel_idx: list index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]

Parameters:

test_ratio: float, optional (default=0.3): ratio of test set
initial_label_rate: float, optional (default=0.05): ratio of initial label set or the existed features (missing rate = 1-initial_label_rate)
e.g. initial_labelset*(1-test_ratio)*n_samples
split_count: int, optional (default=10): random split data _split_count times
all_class: bool, optional (default=True): whether each split will contain at least one instance for each class.
If False, a totally random split will be performed.

Returns:

train_idx: list: index of training set, shape like [n_split_count, n_training_indexes]
test_idx: list: index of testing set, shape like [n_split_count, n_testing_indexes]
label_idx: list: index of labeling set, shape like [n_split_count, n_labeling_indexes]
unlabel_idx: list: index of unlabeling set, shape like [n_split_count, n_unlabeling_indexes]

alipy.experiment.al_experiment. start_query

alipy.experiment.al_experiment.start_query(self, multi_thread=True, **kwargs)

Start the active learning main loop If using implemented query strategy, It will run in multi-thread default.

Parameters:	multi_thread: bool, optional (default=True) Decide whether to run in multi-thread. if multi thread is True,it will run in multi-thread way. if not,it will be executed sequentially. kwargs: set, optional The parameters will use in aceThreading init or stateio init. if the kwargs is None,it will init in the default way. if multi_thread is True,the kwargs will be used for aceThreading init, aceThreading(A class implement multi-threading in active learning for multiple random splits experiments.) and you can see the specific parameter settings in alipy/utils/'multi_thread.py' init(). if not,the kwargs will be used for stateio init, stateio(A class to store states.) and you can see the specific parameter settings in alipy/experiment/'state_io.py' init(). Note that, each parameters should be static.

Parameters:

multi_thread: bool, optional (default=True): Decide whether to run in multi-thread.
if multi thread is True,it will run in multi-thread way.
if not,it will be executed sequentially.
kwargs: set, optional: The parameters will use in aceThreading init or stateio init.
if the kwargs is None,it will init in the default way.; if multi_thread is True,the kwargs will be used for aceThreading init,
aceThreading(A class implement multi-threading in active learning for multiple
random splits experiments.)
and you can see the specific parameter settings in alipy/utils/'multi_thread.py' init().; if not,the kwargs will be used for stateio init,
stateio(A class to store states.)
and you can see the specific parameter settings in alipy/experiment/'state_io.py' init().; Note that, each parameters should be static.

alipy.experiment.al_experiment. get_experiment_result

alipy.experiment.al_experiment.get_experiment_result(self, title=None)

Print the experiment result,and draw a line chart.

Parameters:	title: str the title of the line chart.