Black Lives Matter. Please consider donating to Black Girls Code today.
Dash Enterprise delivers an incredible 21x cost savings 💸Download the e-book!

Plotly Express Trendlines, just the Lines

Hi,

I really love the simplicity of plotly express’s scatter(trendline=‘ols’) option. Thank you for the package! I’ve used it to quickly perform OLS calculations and plot trendlines across thousands of categories at once, storing all the ols results within a list of lists of dataframes, which allows me to show them as separate tables in web applications. The high level wrapper here is powerful for having such simple syntax!

However, as I push the boundaries of number of groups/trendlines like this, plotting becomes crowded with both points and trendlines. I’d like to remove all the scatter markers from the plots to make getting the ols info on hover a little bit easier. I’ve looked around for a while and read through the reference documentation, but have not found a way to do this using plotly express.

So here is my question:

Does plotly express provide a way to remove scatter traces but keep trendline traces, or alternatively, to simply disable the scatter trace hoverinfo but keep trendline hoverinfo?

I know I could achieve a similar plot functionality by going to the lower level and creating dictionaries for each trace, but this loses the convenience I love about using express. I’d end up importing scikit learn (or similar) separately, doing calculations across categories first, looping through traces across categories, etc.

Thank You!

Hi @IanK,

In a trendline figure you have two traces: the first one is for data points, while the second one for the trendline. To get the data points invisible, just make the following update:

fig.data[0].update(visible=False)

I would recommend using update_traces() with a selector, as it will target all non-line traces:

import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", facet_col="sex", trendline="ols")
fig.update_traces(visible=False, selector=dict(mode="markers"))
fig.show()

Hi, thanks very much for the suggestion. I actually tried this method already. Unfortunately this one doesn’t work in my Dash app. I think the reason is that there are many more traces than 2 in my plots. When plotting for say 10 groups using color = somecategoricalvariable, which traces by index would be markers and which would be trendlines? Does it “collate” them like 0 is a marker, 1 is a line, or would it be 0-9 are markers and 10-19 are lines?

Hey, this works! I didn’t know about the selector. It’s cool that it works for an unspecified number of groups in a dataset too. Thanks much.

Oops, there is one issue. I lose the legend when I make that change. Is there a way to direct express to make a legend for the trendlines rather than the markers?

I was able to answer both of my above questions. :slight_smile:

  1. The data traces are collated such that trendlines fall on all odd indices after their respective marker.

  2. The legend can be brought back in with something like this:
    fig.update_traces(showlegend=True, selector=dict(mode=‘lines’))

    (or by running through the odd indices using empet’s suggesion)

Thanks to you both.

That’s what I was going to suggest, I’m glad you worked it out :slight_smile:

1 Like

Thanks again!

I have a related question that maybe is still good here. After I make all the regressions as above, I have been pulling all the diagnostic tables from px.get_trendline_results(figure). I’ve been using this info in dynamic tables, and also to create a large plot of residuals vs. fits with the make_subplots() function. I didn’t like facet wrapping in plotly express for this due to sizing issues while increasing plot height, and because the residuals are quite different sizes for different groups. So I went with a bit of a lower level build on the plot using make_subplots() and add_trace(go.Scatter()) that iteratively pulls residuals and fits from the first regression plot results. The plot currently looks like this:

I’d like to standardize the y axis range on the individual traces now. I don’t want to share y axes because that will wash out the residual subplots with smaller values on y. Instead, I would just like zero to be the center line in each y axis trace.

I guess this might be something like setting the full range of the y axis as twice the absolute value between the largest residual and zero, centered on zero. Any ideas about how I might get that done? Thanks in advance!

Yep, you can set the range per subplot with fig.update_yaxes(row=R, col=C, range=[min, max]) so you can probably do that in the same loop as you’re adding traces?

Also note: as of v4.9 Plotly Express has facet_row_spacing and facet_col_spacing arguments to control this :slight_smile:

Really cool! The loop has become pretty ridiculous but still runs fast. Where subgroups are values from my discrete color map of whatever categorical variable the user selects, and the residuals and fits values come from my manipulations of the first graph results:

### Let's plot the residuals! ###
cols = 2
for idx, i in enumerate(subgroups):
    # Setting y range
    diag.update_yaxes(row=math.ceil((idx+1)/cols), col=([i for i in range(1,cols+1)]*math.ceil(n/cols))[idx],
                      range=[0-1.1*max([abs(r) for r in diagnostics.loc[diagnostics[catindicator] == i,'Residuals']]),
                             0+1.1*max([abs(r) for r in diagnostics.loc[diagnostics[catindicator] == i,'Residuals']])])

But it works! So thank you once again.

I will have to try out express again and see if I can redo it with a little less… stuff. :smile: