anchor_python_visualization.visualize_features

Visualizes embeddings in a CSV file by plotting or TensorBoard.

Introduction

The script:

  1. Creates embeddings, by projecting the embeddings into a lower dimensional space.

  2. Visualizes the embeddings.

Both steps offer a choice of methods.

Input Arguments

Projection methods

-p or –projection

  • t-SNE (default)

  • PCA

  • none - unchanged dimensionality for the embeddings.

Visualization methods

-m or –method

  • plot - interactive 2D plot of embeddings via ploty (default)

  • TensorBoard - exports a log directory to TensorBoard at –output-path or -o

Optionally, image thumbnails can be associated with each embedding for TensorBoard export with –image_sequence or –image_path containing paths where the string PLACEHOLDER_FOR_SUBSTITUTIONis substituted respectively:

  • with an index from an incrementing six digit integer with leading zeros, corresponding to row order, or,

  • the unique identifier for the embedding.

Structure of the CSV File

The CSV file should have:

  • embeddings as columns.

  • data-items as rows.

  • include headers as the first row.

  • one column called COLUMN_NAME_IDENTIFIER with unique identifiers for each embedding.

Otherwise:

  • the numeric columns are treated as feature-values

  • the non-numeric columns can be combined into a label via the –max_label_index argument, combining a number of these columns from the left or the right.

Note the label is split into separate groups by a slash (forward or backwards), and –max_label_index specifies a maximum number of groups to be read from the left (if positive) or to be excluded from the right (if negative).

--encoding specifies the encoding of the CSV file as per Python’s standard encodings.

Example Usage

Install the package in this repository, by:

  • pip install . (in the root of the checked out repository) or

  • pip install git+https://github.com/anchoranalysis/anchor-python-visualization.git

Plotting

Plotting using t-SNE to project to two dimensions.

python -m anchor_python_visualization.visualize_features
    D:\someDirectory\features.csv
    -p t-SNE
    -m plot

TensorBoard export

  1. Create the log-directory:

python -m anchor_python_visualization.visualize_features
    D:\someDirectory\features.csv
    -p none
    -m TensorBoard
    --output D:\someDirectory\tensorboard_logs
    --image_sequence D:\someDirectory\thumbnails\thumbnails_<IMAGE>.png
    -–max_label_index -1

The penultimate parameter is optional, and includes thumbnails.

The ultimate parameter directs the group label to, ignores the “last” port of string i.e. after the final slash.

  1. Open the log-directory in TensorBoard.

tensorboard --logdir D:\someDirectory\tensorboard_logs
  1. Open the shown URL, probably http://localhost:6006/

  2. Select Projector from the drop-down list box in the top-right corner.

Module Contents

Functions

main()

Entry point.

anchor_python_visualization.visualize_features.main()[source]

Entry point.