✊🏿 Black Lives Matter. Please consider donating to Black Girls Code today.
⚾️ It's finally Baseball season! Root for the home team... & Register for our Sports Analytics Webinar!

Colorscale inaccurate

Hi,

I’m using plotly with Python, and am having some trouble with the colorscale property - I don’t think it’s mapping the values to the colours correctly:

The input is a list of 5000 values from a Normal distribution with mean 0 and variance 1.
The colorscale should be blue for values < -0.25, white for values between -0.25 and 0.25, and red for values greater than 0.25. Note, this is for the values.

Of course, with colorscale we have to map values onto the interval [0,1], so I’ve done this. This input

print(
value_to_percentile(fake_values, -0.25),
value_to_percentile(fake_values, 0),
value_to_percentile(fake_values, 0.25)
)

produces this output: 0.394 0.4912 0.5884, which looks pretty close to what we’d expect from a standard normal distribution.

My colorscale is as follows:

colorscale=[
                [0, 'rgb(0,0,255)'],
                [value_to_percentile(fake_values, -0.25), 'rgb(0,0,255)'],
                [value_to_percentile(fake_values, -0.25), 'rgb(255,255,255)'],
                [value_to_percentile(fake_values, 0), 'rgb(255,255,255)'],
                [value_to_percentile(fake_values, 0.25), 'rgb(255,255,255)'],
                [value_to_percentile(fake_values, 0.25), 'rgb(255,0,0)'],
                [1, 'rgb(255,0,0)']
            ]

which I believe accurately represents what I outlined above for the colours I want.

However, my colorscale is coming out like this:

Where the thresholds are more like +/- 0.7 than +/- 0.25. Any idea what’s going wrong?

Thanks

Hey @taimur ,
Something went wrong in your approach, but I cannot figure out what’s wrong, because I don’t know your data, and what type of trace you plotted.

I defined the colorscale as follows:

 import scipy stats as st
 colorscale=[[0, 'blue'],
       [st.norm.cdf(-0.25), 'blue'],
       [st.norm.cdf(-0.25), 'white'],
       [st.norm.cdf(0.25), 'white'],
       [st.norm.cdf(0.25), 'red'],
       [1, 'red'] ]

and got this plot:


Here is the corresponding Jupyter notebook: https://plot.ly/~empet/14587

Hey @empet, thanks a lot for taking a look at this! I checked out your notebook and it seems to me like the problem is still there (but perhaps I’m misinterpreting the graph - let me know!) -

I agree with your colorscale - if I read it right, it should make point (x,y) white if x is in [-0.25,0.25]. However, from looking at the plot, it looks like white points are actually more like x in [-0.8, 0.3] - do you agree? And the colour bar itself also seems to show this.

I’ve also reproduced your notebook and done the same plot but with 10k points and it shows a similar thing - the points in white seem go significantly outside of the x in [-0.25, 0.25] constraint. However,

sorted(x)[int(round(st.norm.cdf(-0.25) * len(x)))] outputs -0.26114270136600848 which is about right…

Am I looking at it wrong somehow?

Thanks, really appreciate your help!

The colorbar does not illustrate the subintervals of values as you expected, because the
range of x-values, [ min(x), max(x)], is mapped to [0,1], in the definition of the colorscale, via the CDF of the normal standard distribution, which is a nonlinear function.
Having a Plotly colorscale, x-values are first linearly normalized by the Plotly, i.e. mapped to [0,1], by the linear
function x-->(x-min(x))/(max(x)-min(x)), and these normalized values are mapped to the corresponding color in the colorscale.
To be more precise, suppose that
min(x)=-3, max(x)=3. Then x=0.3 is mapped linearly to (0.3+3)/6=3.3/6=0.55 in [0.1].
Our colorscale being:

 [[0, 'blue'],
  [0.4012936743170763, 'blue'],
  [0.4012936743170763, 'white'],
  [0.5987063256829237, 'white'],
  [0.5987063256829237, 'red'],
 [1, 'red']]

the color associated to the normalized value, 0.55, is white (not red), because 0.55 belongs to the interval [0.4012936743170763, 0.5987063256829237].
If Plotly normalization function were the normal distribution CDF, the colorbar would illustrate the right values.

Thanks for the reply @empet.

Still a bit puzzled by this - I agree that the normal CDF is non-linear, but it’s still a strictly increasing function and so should preserve the ordering of the original values, which is what we’re really concerned about. Likewise, the Plotly normalisation also preserves the ordering.

If the closest value to -0.25 in the original list is 0.401 of the way through the list (by the inverse of the normal CDF with -0.25), then in the Plotly-normalised list, the value that’s 0.401 of the way through should still correspond to the original value that was close to -0.25. Do you agree?

I’m obviously looking at this wrong somehow because what you said in your previous post is definitely what’s going on in the graph, but I’m struggling to get my head around why exactly my interpretation is wrong.

Thanks!

@taimur To understand why the distribution of colors is not symmetric when the colors are computed via Plotly, I illustrated in this notebook https://plot.ly/~empet/14596 how it should assign the colors from your chosen colorscale, but it doesn’t (because it works exclusively with linearly defined colorscales).
This is the plot with colors computed by a custom function: