✊🏿 Black Lives Matter. Please consider donating to Black Girls Code today.
🐇 Announcing Dash VTK for 3d simulation graphics. Check out the March webinar.

Coloring go.Splom Per Cell Instead of Per Row and Losing Data

Hello all,

I am attempting to generate a basic Splom graph, however, rather than having a standard color by row, I want to color individual cells. Thus, the coloring will be specific per subgraph of the Splom. I am doing this specifically to look at imputed values to ensure that the imputed data matches what is expected.

    colors = pd.DataFrame(np.zeros(df.shape), columns = df.columns)
    for val in missing_values:
        colors.loc[val] = 1
    
    fig = go.Figure()
    
    fig.add_trace(
    
        go.Splom(
            dimensions = [
                dict(label = column, values = df[column]) for column in df.columns
            ], 
            marker = dict(
                color = colors
            )
        )
    )
    
    fig.update_layout(
        title = "test"
    )
    
    offline.plot(fig)

This seems to kinda work, in that it shows most values where the color category is 0 (but not all). However, it does not show category 1 values at all and if there are too many category 1 values the splom does not show anything.

Graph with color categories:

Graph without color categories: same data.

Any ideas?

Hi @WolVes,

go.Splom has a special definition, different from any Plotly trace. Although it displays a subplot looking figure,
inspecting len(fig.data) we are noticing that it is 1, not n*n, where n is the number of dimensions (data variables). Hence you cannot map data from individual cells to designated colors.

Inspecting go.splom.Marker.color via

help(go.splom.Marker.color)` we learn that:
Help on property:

    Sets the markercolor. It accepts either a specific color or an
    array of numbers that are mapped to the colorscale relative to
    the max and min values of the array or relative to
    `marker.cmin` and `marker.cmax` if set.
    
    The 'color' property is a color and may be specified as:
      - A hex string (e.g. '#ff0000')
      - An rgb/rgba string (e.g. 'rgb(255,0,0)')
      - An hsl/hsla string (e.g. 'hsl(0,100%,50%)')
      - An hsv/hsva string (e.g. 'hsv(0,100%,100%)')
      - A named CSS color
      - A number that will be interpreted as a color
        according to splom.marker.colorscale
      - A list or 1D array of any of the above

In your example, above, the definition of colors does not follow any of these cases. It is defined as a DataFrame whose shape len is >1. That’s why it cannot work as you expected.

ooff. Alright, then. Time to make my own n*n Splom then. Thanks.

For those who will inevitably want to do the same thing…

fig = make_subplots(rows = len(df.columns), 
                    cols = len(df.columns), 
                    shared_xaxes = True, 
                    shared_yaxes = True,
                    vertical_spacing = 0.01, 
                    horizontal_spacing = 0.01,
                    column_titles = list(df.columns), 
                    row_titles = list(df.columns)
                   )

for count1, col1 in enumerate(df.columns):
    for count2, col2 in enumerate(df.columns):

        tmp_colors = colors[[col1, col2]].sum(axis = 1)
        
        fig.add_trace(

            go.Scatter(
                x  = df[col1],
                y  = df[col2],
                mode = 'markers', 
                marker = dict(
                    color = tmp_colors
                ), 
                name = None
            ),
            row = count1 + 1, 
            col = count2 + 1
        )   


fig.update_layout(
    title = title, 
    showlegend = False,
    height = 1000,
    width  = 1000
)

for i in range(0, len(df.columns)):
    fig.layout.annotations[i]["yref"] = "paper"
    fig.layout.annotations[i]["xref"] = "paper"
    fig.layout.annotations[i]["y"] = -0.06
    
    fig.layout.annotations[i + len(df.columns)]["yref"] = "paper"
    fig.layout.annotations[i + len(df.columns)]["xref"] = "paper"
    fig.layout.annotations[i + len(df.columns)]["x"] = -0.06
    
fig.show()