:orphan: :py:mod:`anchor_python_visualization.visualize_features` ======================================================== .. py:module:: anchor_python_visualization.visualize_features .. autoapi-nested-parse:: Visualizes embeddings in a CSV file by plotting or TensorBoard. ------------ Introduction ------------ The script: 1. Creates embeddings, by projecting the embeddings into a lower dimensional space. 2. Visualizes the embeddings. Both steps offer a choice of methods. --------------- Input Arguments --------------- Projection methods ------------------ `-p` or `--projection` * `t-SNE `_ **(default)** * `PCA `_ * `none` - unchanged dimensionality for the embeddings. Visualization methods ---------------------- `-m` or `--method` * `plot` - interactive 2D plot of embeddings via `ploty `_ **(default)** * `TensorBoard` - exports a *log directory* to `TensorBoard `_ at `--output-path` or `-o` Optionally, image thumbnails can be associated with each embedding for `TensorBoard` export with `--image_sequence` or `--image_path` containing paths where the string :const:`~embeddings.PLACEHOLDER_FOR_SUBSTITUTION`\ is substituted respectively: * with an index from an incrementing six digit integer with leading zeros, corresponding to row order, or, * the unique identifier for the embedding. Structure of the CSV File ------------------------- The CSV file should have: * embeddings as columns. * data-items as rows. * include headers as the first row. * one column called :const:`~embeddings.COLUMN_NAME_IDENTIFIER` with unique identifiers for each embedding. Otherwise: * the *numeric* columns are treated as feature-values * the *non-numeric* columns can be combined into a label via the `--max_label_index` argument, combining a number of these columns from the left or the right. Note the label is split into separate groups by a slash (forward or backwards), and `--max_label_index` specifies a maximum number of groups to be read from the left (if positive) or to be excluded from the right (if negative). ``--encoding`` specifies the encoding of the CSV file as per `Python's standard encodings `_. ------------- Example Usage ------------- Install the package in this repository, by: * `pip install .` (in the root of the checked out repository) or * `pip install git+https://github.com/anchoranalysis/anchor-python-visualization.git` Plotting -------- Plotting using t-SNE to project to two dimensions. :: python -m anchor_python_visualization.visualize_features D:\someDirectory\features.csv -p t-SNE -m plot TensorBoard export ------------------ 1. Create the log-directory: :: python -m anchor_python_visualization.visualize_features D:\someDirectory\features.csv -p none -m TensorBoard --output D:\someDirectory\tensorboard_logs --image_sequence D:\someDirectory\thumbnails\thumbnails_.png -–max_label_index -1 The penultimate parameter is optional, and includes thumbnails. The ultimate parameter directs the group label to, ignores the "last" port of string i.e. after the final slash. 2. Open the log-directory in `TensorBoard `_. :: tensorboard --logdir D:\someDirectory\tensorboard_logs 3. Open the shown URL, probably `http://localhost:6006/ `_ 4. Select ``Projector`` from the drop-down list box in the top-right corner. Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: anchor_python_visualization.visualize_features.main .. py:function:: main() Entry point.