ALiPy provides a module based implementation of active learning framework. It aims to support experiment implementation with miscellaneous tool functions. These tools are designed in a low coupling way in order to let users to program the experiment project at their own customs.
You can get support from ALiPy by:
* Using alipy.data_manipulate to preprocess and split your data sets for experiments.
* Using alipy.query_strategy to invoke traditional and state-of-the-art methods.
* Using alipy.index.IndexCollection to manage your labeled indexes and unlabeled indexes.
* Using alipy.metric to calculate your model performances.
* Using alipy.experiment.state and alipy.experiment.state_io to save the intermediate results after each query and recover the program from the breakpoints.
* alipy.experiment.stopping_criteria to get some example stopping criteria.
* Using alipy.experiment.experiment_analyser to gathering, process and visualize your experiment results.
* Using alipy.oracle to implement clean, noisy, cost-sensitive oracles.
* Using alipy.utils.multi_thread to parallel your k-fold experiment.
The basic and detailed introduction of each tool can be found at 10 mins to alipy and advanced guidelines . For the example codes of usage, please refer to the github and download the source code of alipy. The example code is in the alipy/examples.
Note that, the above tool classes are designed with great care in order to adapt various usages. ALiPy supports users to easily configure and implement their own approaches under different active learning settings, such as AL for multi-label data, AL with noisy annotators, AL with different costs and so on. See the following content to learn more.
Due to the low coupling of the alipy, it is easy to implement experiment in other special settings.
ALiPy supports many special active learning settings. To run the experiments in these settings, you can read the following tutorials which provide the introductions to the specialized tools and example codes:
AL with Noisy Oracles - The oracle may return incorrect labels sometime.
AL for Multi-Label Data - An instance is associated with multiple labels simultaneously.
AL with Different Costs - The cost of querying different labels can be different.
AL by Querying Features - Select missing features of instances for querying.
AL with Novel Query Types - Query other types of information of instances instead of their labels.
AL for Large Scale Tasks - Active learning in big data.
ALiPy provides more than 20 advanced algorithms for different active learning settings. Here is the list:
AL with Instance Selection : Uncertainty (SIGIR 1994) , Graph Density (CVPR 2012) , QUIRE (TPAMI 2014) , SPAL (AAAI 2019) , Query By Committee (ICML 1998) , Random , BMDR (KDD 2013) , LAL (NIPS 2017) , Expected Error Reduction (ICML 2001)
AL for Multi-Label Data : AUDI (ICDM 2013) , QUIRE (TPAMI 2014) , Random , MMC (KDD 2009) , Adaptive (IJCAI 2013)
AL by Querying Features : AFASMC (KDD 2018) , Stability (ICDM 2013) , Random
AL with Different Costs : HALC (IJCAI 2018) , Random , Cost performance
AL with Noisy Oracles : CEAL (IJCAI 2017) , IEthresh (KDD 2009) , All , Random
AL with Novel Query Types : AURO (IJCAI 2015)
AL for Large Scale Tasks : Subsampling
During the procedure of implementing the above algorithms, we find that there are many by-productss can also be used independently (e.g., The matrix completion method in KDD'18 AFASMC, the multi-label classification model in ICDM'13 AUDI, etc.). We also encapsulate these functions for using solely:
Matrix completion : AFASMC_mc (KDD 2018) , IterativeSVD_mc
Multi-label classification model : LabelRanking (ICDM'13)
Optimization tools : POSS (NIPS'15)
Some users may also need a high level encapsulation which is eaiser to use. Luckily, alipy also provides a class which has encapsulated various tools and implemented the main loop of active learning, namely alipy.experiment.AlExperiment .
Note that, AlExperiment only support the most commonly used scenario - query all labels of an instance. You can run the experiments with only a few lines of codes by this class. All you need is to specify the various options, the query process will be run in multi-thread.
Here is an example usage of this class:
from sklearn.datasets import load_iris
from alipy.experiment.al_experiment import AlExperiment
X, y = load_iris(return_X_y=True)
al = AlExperiment(X, y, stopping_criteria='num_of_queries', stopping_value=50)
al.split_AL()
al.set_query_strategy(strategy="QueryInstanceUncertainty", measure='least_confident')
al.set_performance_metric('accuracy_score')
al.start_query(multi_thread=True)
al.plot_learning_curve()
For more details, please refer to the tutorial for AlExperiment .
Copyright © 2018, alipy developers (BSD 3 License).