Pool-Based Machine Teaching
Pool-based machine teaching search algorithms through a file-based API.
Getting Started
poolmate provides a command-line interface to algorithms for
searching for teaching sets among a candidate pool. poolmate is
designed to work with any learner which can be communicated with
through a file-based API.
To wit, typical usage requires from the client:
- A candidate pool of items kept in a file, one item per line
- A command which
poolmatecan execute to obtain the loss of a teaching set - Parameter settings for the search algorithm
All methods support both teaching sets and teaching sequences. That is, duplication and order of teaching items will be significant if the provided learner treats them as significant.
For the details, see Usage.
For an introduction to machine teaching, see Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education.
For an overview of our research, see here.
Installation
Dependencies can be installed with
pip install numpy pandas scipy sklearn tqdm
This project has been tested with Python 2.7.
Usage
Command-line interface
The following describes basic usage. For more options, see Options.
python poolmate/teach.py --candidate-pool-filename CANDIDATE_POOL_FILENAME \
--loss-executable LOSS_EXECUTABLE \
--output-filename OUTPUT_FILENAME \
--teaching-set-size TEACHING_SET_SIZE \
--search-budget SEARCH_BUDGET
--candidate-pool-filename is a file which contains the candidate pool to search from, one item per line.
--loss-executable is an executable which poolmate will call during its execution. This executable must take two command-line arguments FILE1 and FILE2. The first argument FILE1 will contain a set of items for the learner to train on. The second argument FILE2 will be a filename where the executable should write the loss of learner after training on the items in FILE1. The lines in FILE1 will simply be a subset of the lines in CANDIDATE_POOL_FILENAME.
So for example the contents of FILE1 might look like:
1, 0.658947147839417, 0.752189242381396
1, 0.231140742000439, -0.972920324275059
-1, -0.830995051808994, 0.556279807173483
-1, -0.433446335329234, -0.901179379696216
1, 0.958199172890462, 0.286101983691191
Let’s say the executable is named my_learner, it will be called with:
my_learner FILE1 FILE2
my_learner must train on the items in FILE1 and write the loss of the trained learner to FILE2 on a single line, say:
0.03
Please note that poolmate will use unique filenames on successive calls to the loss executable.
--output-filename is a filename where results are written. The first line of this file will contain the loss while the remaining lines will contain the rows out of CANDIDATE_POOL_FILENAME which represent the best teaching set found during search. For example, if TEACHING_SET_SIZE were set to 2, the output file may look something like:
0.00602749784937
-1, -0.134432406608285, -0.990922765937641
1, 0.171855154919438, -0.985122228826259
--teaching-set-size is the size of the best teaching set poolmate will return.
--search-budget is the the number of models poolmate is allowed fit before returning. This will be equivalent to the number of calls to LOSS_EXECUTABLE.
As an example of a command-line invocation including all required parameters, the following example runs a search over a candidate pool of items drawn uniformly from the boundary of a circle using an SVM learner:
python poolmate/teach.py \
--candidate-pool-filename poolmate/test/circle.csv \
--loss-executable "python poolmate/test/svm.py" \
--output-filename output.txt \
--teaching-set-size 2 \
--search-budget 200
Due to the stochastic nature of search, your result may differ from the loss and teaching set given in the example above.
Programmatic Interface
If one wishes to avoid the overhead of executable callbacks and is willing to
write a learner in Python, one can invoke poolmate programmatically. In this
case, one has to provide a learner instance which implements two methods:
class MyLearner(object):
def loss(self, model):
# ... return some a float loss
def fit(self, xy):
# ... return model fit on xy
The fit method must fit a model on xy, which is an iterable subset of the candidate pool.
The loss method receives as an argument the model returned fit and must itself return a loss of float type.
Here is an example of its invocation:
from poolmate.teach import Runner, build_options
runner = Runner()
learner = MyLearner()
options = build_options(search_budget=10000,
teaching_set_size=10)
best_loss, best_set = runner.run_experiment(candidate_pool, learner, options)
Options
Several other options are available using the command line:
-h, --help show this help message and exit
--candidate-pool-filename CANDIDATE_POOL_FILENAME
Filename for candidate pool. The format of the file is
that candidate items are represented one item per
line. (default: None)
--loss-executable LOSS_EXECUTABLE
Executable command which will return loss on teaching
set. Executable must take two command-line arguments,
an `inputfilename` containing the teaching set to
train the learner on, one item per line, and an
`outputfilename` where the loss should be written
(default: None)
--output-filename OUTPUT_FILENAME
Output filename where the best found teaching set and
loss are written at the search procedure's termination
(default: None)
--teaching-set-size TEACHING_SET_SIZE
Size of teaching set to return. (default: None)
--search-budget SEARCH_BUDGET
Budget of number of models to fit. This is the number
of times `loss-executable` will be invoked. (default:
None)
--proposals PROPOSALS
Number of proposals to consider at each search
iteration. A tuning parameter for 'greedy-add' and
'random-index-greedy-swap' algorithms (default: None)
--seed SEED Set random seed to achieve reproducibility. (default:
None)
--algorithm {greedy-add,random-index-greedy-swap,uniform}
Choice of search algorithm (default: greedy-add)
--initial-teaching-set INITIAL_TEACHING_SET
A comma-separated list of zero-based indices to fix
the initial teaching set. Used in 'random-index-
greedy-swap' and 'uniform' algorithms (e.g.,
--initial-teaching-set 53,17 ). (default: None)
--log LOG Filename of log file, where interim results are logged
as comma-separated values (CSV). The three colums of
the output represent the iteration number, the loss of
the trained model for that iteration, and the teaching
set for that iterations. (default: None)
FAQ
My learner is a MATLAB function? How can poolmate call a MATLAB function?
One method is to wrap the call to MATLAB into a shell script. For example, let’s say your MATLAB function is
function [ output_args ] = my_learner(FILE1, FILE2)
Create the file my_learner.sh
#!/bin/sh
matlab -nodesktop -nosplash -nodisplay -r "my_learner $1 $2; quit" >/dev/null 2>/dev/null
Be sure to set the script’s permissions with
chmod +x my_learner.sh
And then you can call it by setting --loss-executable ./my_learner.sh.
Acknowledgements
This project is based upon work supported by the National Science Foundation under Grant No. IIS-0953219. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
- Ara Vartanian (aravart@cs.wisc.edu)
Collaborators
- Scott Alfeld (salfeld@amherst.edu)
- Ayon Sen (ayonsn@cs.wisc.edu)
- Jerry Zhu (jerryzhu@cs.wisc.edu)
Contacts
- Ara Vartanian (aravart@cs.wisc.edu)
- Jerry Zhu (jerryzhu@cs.wisc.edu)