Black Lives Matter. Please consider donating to Black Girls Code today.

📣 Plotly AI & ML Docs have now been released!

Super happy to announce that you can now find the AI/ML section in the official Plotly docs.

We hope that they will be a useful reference or code template for your own ML visualizations! Let us know what sections you would like to see next and what could be improved.

Right now, there’s five sections that cover how to visualize fundamental ML models and methods. Most of them use scikit-learn, since it contains many of the fundamental ML algorithms. In the last case

To get started, you can simply install all the packages:

pip install plotly scikit-learn pandas

and optionally (for the last section):

pip install umap-learn

Below, I copied the description of each section, so that you can have a quick overview:

ML Regression

This page shows how to use Plotly charts for displaying various types of regression models, starting from simple models like Linear Regression, and progressively move towards models like Decision Tree and Polynomial Features. We highlight various capabilities of plotly, such as comparative analysis of the same model with different parameters, displaying Latex, surface plots for 3D data, and enhanced prediction error analysis with Plotly Express.

kNN Classification in Python

This section gets us started with displaying basic binary classification using 2D data. We first show how to display training versus testing data using various marker styles, then demonstrate how to evaluate our classifier’s performance on the test split using a continuous color gradient to indicate the model’s predicted score.

We will train a k-Nearest Neighbors (kNN) classifier. First, the model records the label of each training sample. Then, whenever we give it a new sample, it will look at the k closest samples from the training set to find the most common label, and assign it to our new sample.

ROC/PR Curves

This section covers how to interpret Receiver Operating Characteristics (ROC) and Precision-Recall (PR) curves, which are very useful evaluation methods for binary classification.

Principal Component Analysis (PCA)

This page first shows how to visualize higher dimension data using various Plotly figures combined with dimensionality reduction (aka projection). Then, we dive into the specific details of our projection algorithm.

Dimensionality reduction with t-SNE and UMAP

This page presents various ways to visualize two popular dimensionality reduction techniques, namely the t-distributed stochastic neighbor embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). They are needed whenever you want to visualize data with more than two or three features (i.e. dimensions).

We first show how to visualize data with more than three features using the scatter plot matrix, then we apply dimensionality reduction techniques to get 2D/3D representation of our data, and visualize the results with scatter plots and 3D scatter plots.