anchor_python_visualization.visualize_features
Visualizes embeddings in a CSV file by plotting or TensorBoard.
Introduction
The script:
Creates embeddings, by projecting the embeddings into a lower dimensional space.
Visualizes the embeddings.
Both steps offer a choice of methods.
Input Arguments
Projection methods
-p or –projection
Visualization methods
-m or –method
plot - interactive 2D plot of embeddings via ploty (default)
TensorBoard - exports a log directory to TensorBoard at –output-path or -o
Optionally, image thumbnails can be associated with each embedding for TensorBoard export with –image_sequence
or –image_path containing paths where the string PLACEHOLDER_FOR_SUBSTITUTION
is substituted respectively:
with an index from an incrementing six digit integer with leading zeros, corresponding to row order, or,
the unique identifier for the embedding.
Structure of the CSV File
The CSV file should have:
embeddings as columns.
data-items as rows.
include headers as the first row.
one column called
COLUMN_NAME_IDENTIFIER
with unique identifiers for each embedding.
Otherwise:
the numeric columns are treated as feature-values
the non-numeric columns can be combined into a label via the –max_label_index argument, combining a number of these columns from the left or the right.
Note the label is split into separate groups by a slash (forward or backwards), and –max_label_index specifies a maximum number of groups to be read from the left (if positive) or to be excluded from the right (if negative).
--encoding
specifies the encoding of the CSV file as per
Python’s standard encodings.
Example Usage
Install the package in this repository, by:
pip install . (in the root of the checked out repository) or
pip install git+https://github.com/anchoranalysis/anchor-python-visualization.git
Plotting
Plotting using t-SNE to project to two dimensions.
python -m anchor_python_visualization.visualize_features
D:\someDirectory\features.csv
-p t-SNE
-m plot
TensorBoard export
Create the log-directory:
python -m anchor_python_visualization.visualize_features
D:\someDirectory\features.csv
-p none
-m TensorBoard
--output D:\someDirectory\tensorboard_logs
--image_sequence D:\someDirectory\thumbnails\thumbnails_<IMAGE>.png
-–max_label_index -1
The penultimate parameter is optional, and includes thumbnails.
The ultimate parameter directs the group label to, ignores the “last” port of string i.e. after the final slash.
Open the log-directory in TensorBoard.
tensorboard --logdir D:\someDirectory\tensorboard_logs
Open the shown URL, probably http://localhost:6006/
Select
Projector
from the drop-down list box in the top-right corner.
Module Contents
Functions
|
Entry point. |