I really love the simplicity of plotly express’s scatter(trendline=‘ols’) option. Thank you for the package! I’ve used it to quickly perform OLS calculations and plot trendlines across thousands of categories at once, storing all the ols results within a list of lists of dataframes, which allows me to show them as separate tables in web applications. The high level wrapper here is powerful for having such simple syntax!
However, as I push the boundaries of number of groups/trendlines like this, plotting becomes crowded with both points and trendlines. I’d like to remove all the scatter markers from the plots to make getting the ols info on hover a little bit easier. I’ve looked around for a while and read through the reference documentation, but have not found a way to do this using plotly express.
So here is my question:
Does plotly express provide a way to remove scatter traces but keep trendline traces, or alternatively, to simply disable the scatter trace hoverinfo but keep trendline hoverinfo?
I know I could achieve a similar plot functionality by going to the lower level and creating dictionaries for each trace, but this loses the convenience I love about using express. I’d end up importing scikit learn (or similar) separately, doing calculations across categories first, looping through traces across categories, etc.
In a trendline figure you have two traces: the first one is for data points, while the second one for the trendline. To get the data points invisible, just make the following update:
Hi, thanks very much for the suggestion. I actually tried this method already. Unfortunately this one doesn’t work in my Dash app. I think the reason is that there are many more traces than 2 in my plots. When plotting for say 10 groups using color = somecategoricalvariable, which traces by index would be markers and which would be trendlines? Does it “collate” them like 0 is a marker, 1 is a line, or would it be 0-9 are markers and 10-19 are lines?
Oops, there is one issue. I lose the legend when I make that change. Is there a way to direct express to make a legend for the trendlines rather than the markers?
I have a related question that maybe is still good here. After I make all the regressions as above, I have been pulling all the diagnostic tables from px.get_trendline_results(figure). I’ve been using this info in dynamic tables, and also to create a large plot of residuals vs. fits with the make_subplots() function. I didn’t like facet wrapping in plotly express for this due to sizing issues while increasing plot height, and because the residuals are quite different sizes for different groups. So I went with a bit of a lower level build on the plot using make_subplots() and add_trace(go.Scatter()) that iteratively pulls residuals and fits from the first regression plot results. The plot currently looks like this:
I’d like to standardize the y axis range on the individual traces now. I don’t want to share y axes because that will wash out the residual subplots with smaller values on y. Instead, I would just like zero to be the center line in each y axis trace.
I guess this might be something like setting the full range of the y axis as twice the absolute value between the largest residual and zero, centered on zero. Any ideas about how I might get that done? Thanks in advance!
Yep, you can set the range per subplot with fig.update_yaxes(row=R, col=C, range=[min, max]) so you can probably do that in the same loop as you’re adding traces?
Really cool! The loop has become pretty ridiculous but still runs fast. Where subgroups are values from my discrete color map of whatever categorical variable the user selects, and the residuals and fits values come from my manipulations of the first graph results:
### Let's plot the residuals! ###
cols = 2
for idx, i in enumerate(subgroups):
# Setting y range
diag.update_yaxes(row=math.ceil((idx+1)/cols), col=([i for i in range(1,cols+1)]*math.ceil(n/cols))[idx],
range=[0-1.1*max([abs(r) for r in diagnostics.loc[diagnostics[catindicator] == i,'Residuals']]),
0+1.1*max([abs(r) for r in diagnostics.loc[diagnostics[catindicator] == i,'Residuals']])])