Scatter plot with two legends

I would like to plot two continuous variables as a scatter plot, and more specifically, segment this data using two categorical variables: one categorical variable controls the marker color, and the other categorical variable controls marker size.

I would like to generate one legend based on marker color, and one legend based on size. Is this possible in Plotly? It would be pretty straightforward in Matplotlib. I can show the x,y coordinates, and the color / size values in the hovertemplate, but currently, I can only group the legend using a single categorical variable.

Any help is appreciated.

Hi @keschenburg90 ,

this does not answer your question, but you could use the facet_col or facet_row parameter.

import pandas as pd
import plotly.express as px

# create dummy data
df = pd.DataFrame({'x': [1, 2, 3], 'y': [1, 2, 3], 'c1': ['a', 'b', 'a'], 'c2': ['d', 'd', 'e']})

# create figure
fig = px.scatter(data_frame=df, x='x',y='y', color='c1', facet_col= 'c2')

I’ll try to come up with an answer to your question, though…

Hi @keschenburg90 ,

still not perfect answer but better IMHO. The idea is to create a trace for each data point. I have seen this on StackOverflow.

import pandas as pd
import plotly.graph_objects as go

df = pd.DataFrame({'label': ['a', 'b', 'a', 'b', 'c'],
                   'cat': ['e', 'd', 'd', 'e', 'e'],
                   'x': [1, 2, 3, 4, 5],
                   'y': [1, 2, 3, 4, 5]})

color = {'a': 'rgb(147,112,219)',
         'b': 'rgb(220,20,60)',
         'c': 'rgb(0,128,0)'}

size = {'d':8,
        'e':12}

fig = go.Figure()
for lbl in df.label.unique():
    dfl = df[df.label==lbl]
    for sz in dfl.cat.unique():
        dfc = dfl[dfl.cat==sz]
        fig.add_traces(
            go.Scatter(
                x=dfc['x'], 
                y=dfc['y'], 
                mode='markers',
                name=f'label {lbl}, cat {sz}',
                marker = dict(
                    color=color[lbl], 
                    size = size[sz]
                )
            )
        )
fig.show()

which produces

Going one step further would be dividing the traces by the two categorical variables:

import pandas as pd
import plotly.graph_objects as go

df = pd.DataFrame({'label': ['a', 'b', 'a', 'b', 'c'],
                   'cat': ['e', 'd', 'd', 'e', 'e'],
                   'x': [1, 2, 3, 4, 5],
                   'y': [1, 2, 3, 4, 5]})

color = {'a': 'rgb(147,112,219)',
         'b': 'rgb(220,20,60)',
         'c': 'rgb(0,128,0)'}

size = {'d':8,
        'e':12}

fig = go.Figure()

# traces for categorical variable 'label'
for lbl in df.label.unique():
    dfl = df[df.label==lbl]
    fig.add_traces(
        go.Scatter(
            x=dfl['x'], 
            y=dfl['y'], 
            mode='markers',
            name=f'label {lbl}',
            marker = dict(
                color=color[lbl], 
                size = dfl.cat.apply(lambda x: size[x])
            )
        )
    )

# traces for categorical variable 'cat'
for c in df.cat.unique():
    dummy_x = 1.0
    dummy_y = 3.0
    fig.add_scatter(
        x=[dummy_x], 
        y=[dummy_y],
        mode='markers',
        name=f'cat {c}',
        marker={
            'color': 'rgba(0,0,0,0)', 
            'size': size[c],
            'line': {
                'width': 1,
                'color': 'black'
            }
        },
        visible='legendonly'
    )

fig.show()

which produces

The problem here is, that the first legend items still vary the size of the marker, not only the color. Setting the itemsizing property of the legend to constant is not an option, because it would affect all traces.