Essentials of Anchor
Experiments are inputs, task, outputs
An experiment is roughly-speaking the execution of a task on inputs to produce outputs.
outputs = task(inputs)
Inputs are usually images (or images plus extras). Outputs can be images, text-files, CSVs, XML or any other file-type.
To run an experiment, these three elements must be defined, though often by defaults.
Default experiment
Anchor offers first sensible defaults, allowing for greater definition later.
The default-experiment occurs when you type anchor
without options, and:
- reads image files (inputs) from the current working directory.
- prints some summary information (a task).
- produces no outputs.
$ anchor
Searching for inputs as per default experiment.
Learn how to select inputs, outputs and tasks with 'anchor -h'.
Found 3 inputs.
-> All inputs have extension = jpg
-> File-sizes range across [1 MB to 7 MB] with an average of 4 MB.
-> D:\Users\owen\Pictures\SomeAlbum\${0}.jpg
${0} = "mar" (1) | "jan" (1) | "feb" (1)
Overriding elements on command-line
Each element (inputs, task, outputs) can be overridden using a command-line option -i
or -t
or -o
as follows:
Changing inputs with -i
Option | Description |
---|---|
<ommitted> | default: reads images recursively from current directory |
-i path to XML file |
input as defined in BeanXML at this path |
-i wildcards |
reads files matching the wildcards (e.g. -i *.jpg ) |
-i path to directory |
reads images recursively from the specified directory |
When specifying a directory (without wildcards), files are filtered against list of popular image file extensions.
Changing the task with -t
Option | Description |
---|---|
<ommitted> | default: summary statistics for current inputs |
-t path to XML file |
task as defined in BeanXML at this path |
-t another string |
looks for a task in config/tasks/ matching this name |
Changing outputs with -o
Option | Description |
---|---|
<ommitted> | default: writes into a temporary directory |
-o path to XML file |
output as defined in BeanXML at this path |
-o another string |
writes into the specified directory (creates a subdirectory) |
Combining options
These options can be combined to accomplish both simple and complex pipelines (defined in BeanXML).
Some example commands:
# input from wildcards, task by name, output into a specific absolute directory
anchor -i *.png -t grayscale -o 'c:\Temp\GrayscaleAlbum\'
# task by name
anchor -t summarizeImages
# input and task from BeanXML, output into a specific relative directory
anchor -i sunday-hike.xml -t generate_thumbnail.xml -o ../thumbnails/
Defining an entire experiment in BeanXML
Instead of element-wise definition on the command line (with -i
, -t
, -o
etc.), an entire experiment can be defined in BeanXML, and simply called from the command-line as a whole, e.g. anchor pathToSomeExperiment.xml
.
In full reality, an experiment has more than three elements, as well as wide parameterization possibilities, all initially hidden by defaults. BeanXML provides more finely-grained definition.
Outputs and logs are structured
Two message-logs are produced:
experimentLog.txt
for the experiment as a whole and also printed to the console.jobLog.txt
for each input but only if an error occurs.
Outputs are produced by default in a temporary directory, easily changed with the -o
options.
Parallelization
Inputs are processed in parallel if possible. Some tasks can be executed each input entirely independently, and so fully in parallel across cores; others involve shared memory and a mixture of parallel and sequential steps.