Edit me

Essentials of Anchor

Experiments are inputs, task, outputs

An experiment is roughly-speaking the execution of a task on inputs to produce outputs.

outputs = task(inputs)

Inputs are usually images (or images plus extras). Outputs can be images, text-files, CSVs, XML or any other file-type.

To run an experiment, these three elements must be defined, though often by defaults.

Default experiment

Anchor offers first sensible defaults, allowing for greater definition later.

The default-experiment occurs when you type anchor without options, and:

  1. reads image files (inputs) from the current working directory.
  2. prints some summary information (a task).
  3. produces no outputs.
$ anchor
Searching for inputs as per default experiment.
Learn how to select inputs, outputs and tasks with 'anchor -h'.

Found 3 inputs.
-> All inputs have extension = jpg
-> File-sizes range across [1 MB to 7 MB] with an average of 4 MB.
-> D:\Users\owen\Pictures\SomeAlbum\${0}.jpg
${0} = "mar" (1) | "jan" (1) | "feb" (1)

Overriding elements on command-line

Each element (inputs, task, outputs) can be overridden using a command-line option -i or -t or -o as follows:

Changing inputs with -i

Option Description
<ommitted> default: reads images recursively from current directory
-i path to XML file input as defined in BeanXML at this path
-i wildcards reads files matching the wildcards (e.g. -i *.jpg)
-i path to directory reads images recursively from the specified directory

When specifying a directory (without wildcards), files are filtered against list of popular image file extensions.

Changing the task with -t

Option Description
<ommitted> default: summary statistics for current inputs
-t path to XML file task as defined in BeanXML at this path
-t another string looks for a task in config/tasks/ matching this name

Changing outputs with -o

Option Description
<ommitted> default: writes into a temporary directory
-o path to XML file output as defined in BeanXML at this path
-o another string writes into the specified directory (creates a subdirectory)

Combining options

These options can be combined to accomplish both simple and complex pipelines (defined in BeanXML).

Some example commands:

# input from wildcards, task by name, output into a specific absolute directory
anchor -i *.png -t grayscale -o 'c:\Temp\GrayscaleAlbum\'

# task by name
anchor -t summarizeImages

# input and task from BeanXML, output into a specific relative directory
anchor -i sunday-hike.xml -t generate_thumbnail.xml -o ../thumbnails/

Defining an entire experiment in BeanXML

Instead of element-wise definition on the command line (with -i, -t, -o etc.), an entire experiment can be defined in BeanXML, and simply called from the command-line as a whole, e.g. anchor pathToSomeExperiment.xml.

In full reality, an experiment has more than three elements, as well as wide parameterization possibilities, all initially hidden by defaults. BeanXML provides more finely-grained definition.

Outputs and logs are structured

Two message-logs are produced:

  • experimentLog.txt for the experiment as a whole and also printed to the console.
  • jobLog.txt for each input but only if an error occurs.

Outputs are produced by default in a temporary directory, easily changed with the -o options.

Parallelization

Inputs are processed in parallel if possible. Some tasks can be executed each input entirely independently, and so fully in parallel across cores; others involve shared memory and a mixture of parallel and sequential steps.