Intermediate results IO

alipy.experiment.StateIO object is a class to save and load your intermediate results. This object implements several crucial functions:

- Save intermediate results to files

- Recover workspace (label set and unlabel set) at any iterations

- Recover program from the breakpoint in case the program exits unexpectedly

- Print the active learning progress: current_iteration, current_mean_performance, current_cost, etc.

It is strongly recommended to use this tool class to manage your intermediate results. Because many other components in alipy support StateIO object directly (e.g., Analyser , StoppingCriteria ). If you are going to use those tool classes too, it can save some time on processing the data types.

In the following tutorial, we will introduce the basic usage of alipy.experiment.StateIO and alipy.experiment.State class.

Initialize

Note that, the StateIO object is in units of one-fold experiment, it needs the data split and fold number of your current fold experiment when initializing:

# split your data first
from alipy.experiment import StateIO, State
saver = StateIO(round=0, train_idx=train_idx[0],
                test_idx=test_idx[0], init_L=label_idx[0],
                init_U=unlabel_idx[0], saving_path='.')

When adding query into the StateIO object, it is required to use a alipy.experiment.State object which is a dict like container to save some necessary information of one query (The state of current iteration). Such as cost, performance, selected indexes, and so on.

You need to set the queried indexes and performance when initializing a State object, the cost and queried_labels are optional:

st = State(select_index=select_ind, performance=accuracy,
           cost=cost, queried_label=queried_label)

You can also add some other entries as you need:

st.add_element(key='my_entry', value=my_value)

Basic operations

After you put all useful information into a State object, you should add the state to the StateIO object, and use save() method to save the intermediate results to file:

saver.add_state(st)
saver.save()

If you want to check the previous queries for analysing, you can get any past queries by:

prev_st = saver.get_state(index=1) # get 2nd query
# or use the index operation directly
prev_st = saver[1]

You can use the similar way to get the values in a State object:

value = prev_st.get_value(key='select_index')
# or use the index operation directly
value = prev_st['select_index']

Recover workspace

You can recover the StateIO object to any past states. For example, you have queried 10 times already, and want to go back to the workspace (label and unlabel set) when only 2 queries are performed for analysing, you can invoke get_workspace(iteration) or recover_workspace(iteration) method to achieve this goal.

The formal will return the train, test, label, unlabel indexes of the given iteration, while the object itself remains unchanged. And the latter one will recover itself to the specific iteration which will discard the information after the given iteration .

train, test, L, U = saver.get_workspace(iteration=2)
# or recover the saver itself
saver.recover_workspace(iteration=2)

The iteration parameter is the number of queries you want to recover.
For example, if 0 is given, the initial workspace without any querying will be recovered.

If your experiment exit unexpectly, you can load the StateIO binary file to recover your program without re-run your previous queries.

saver = StateIO.load(path='./AL_round_0.pkl')
train, test, L, U = saver.get_workspace() # will return the latest workspace