`anchor_python_visualization.visualize_features`

Visualizes embeddings in a CSV file by plotting or TensorBoard.

Introduction

The script:

Creates embeddings, by projecting the embeddings into a lower dimensional space.
Visualizes the embeddings.

Both steps offer a choice of methods.

Input Arguments

Projection methods

-p or –projection

t-SNE (default)

PCA

none - unchanged dimensionality for the embeddings.

Visualization methods

-m or –method

plot - interactive 2D plot of embeddings via ploty (default)

TensorBoard - exports a log directory to TensorBoard at –output-path or -o

Optionally, image thumbnails can be associated with each embedding for TensorBoard export with –image_sequence or –image_path containing paths where the string PLACEHOLDER_FOR_SUBSTITUTIONis substituted respectively:

with an index from an incrementing six digit integer with leading zeros, corresponding to row order, or,

the unique identifier for the embedding.

Structure of the CSV File

The CSV file should have:

embeddings as columns.

data-items as rows.

include headers as the first row.

one column called COLUMN_NAME_IDENTIFIER with unique identifiers for each embedding.

Otherwise:

the numeric columns are treated as feature-values

the non-numeric columns can be combined into a label via the –max_label_index argument, combining a number of these columns from the left or the right.

Note the label is split into separate groups by a slash (forward or backwards), and –max_label_index specifies a maximum number of groups to be read from the left (if positive) or to be excluded from the right (if negative).

--encoding specifies the encoding of the CSV file as per Python’s standard encodings.

Example Usage

Install the package in this repository, by:

pip install . (in the root of the checked out repository) or
pip install git+https://github.com/anchoranalysis/anchor-python-visualization.git

Plotting

Plotting using t-SNE to project to two dimensions.

python -m anchor_python_visualization.visualize_features
    D:\someDirectory\features.csv
    -p t-SNE
    -m plot

TensorBoard export

Create the log-directory:

python -m anchor_python_visualization.visualize_features
    D:\someDirectory\features.csv
    -p none
    -m TensorBoard
    --output D:\someDirectory\tensorboard_logs
    --image_sequence D:\someDirectory\thumbnails\thumbnails_<IMAGE>.png
    -–max_label_index -1

The penultimate parameter is optional, and includes thumbnails.

The ultimate parameter directs the group label to, ignores the “last” port of string i.e. after the final slash.

Open the log-directory in TensorBoard.

tensorboard --logdir D:\someDirectory\tensorboard_logs

Open the shown URL, probably http://localhost:6006/
Select Projector from the drop-down list box in the top-right corner.

Module Contents

Functions

main()

Entry point.

anchor_python_visualization.visualize_features.main()[source]: Entry point.

anchor_python_visualization.visualize_features