Pandas parallel package 'modin' does not work with plotly

I tried to use modin to parallelize pandas operations.
However when creating data frames with modin, they look the same but seem to lack the information about column names:

import modin.pandas as pd
# import pandas as pd
import numpy as np
from dash import Dash, html, dcc, callback
from dash.dependencies import Input, Output
import dash_bootstrap_components as dbc
import plotly.express as px
import ray

ray.init(runtime_env={'env_vars': {'__MODIN_AUTOIMPORT_PANDAS__': '1'}})

# tried with df from numpy ...
rand_num_mat = np.random.rand(10, 2)
df = pd.DataFrame(rand_num_mat, columns=['x', 'y'])

# or from csv
# df = pd.read_csv('random.csv')

# columns names 'x' and 'y' are there in both versions
print(df.columns)

app = Dash()

app.layout = html.Div([
    dcc.Graph(id='figure'),
    dbc.Button(id='btn-populate', children='Populate my graph!')
])


@callback(
    Output('figure', 'figure'),
    Input('btn-populate', 'n_clicks'),
    config_prevent_initial_callbacks=True
)
def populate_test_modin_df(n):
    if n:
        # works with pandas and modin
        # return px.scatter(df, x=0, y=1)

        # works with pandas but not with modin
        return px.scatter(df, x='x', y='y')


if __name__ == '__main__':
    app.run_server(debug=True)

I don’t know whether this belongs in a modin forum or here, so I posted it twice :smiley:
(Pandas parallel package 'modin' does not work with plotly - Modin Discuss)

1 Like

I saw that columns of dataframe when making graph has changed from x, y to 0, 1. So maybe below code could work.

import modin.pandas as pd
# import pandas as pd
import numpy as np
from dash import Dash, html, dcc, callback
from dash.dependencies import Input, Output
import dash_bootstrap_components as dbc
import plotly.express as px
from dash.exceptions import PreventUpdate

rand_num_mat = np.random.rand(10, 2)
df = pd.DataFrame(rand_num_mat, columns=['x', 'y'])

app = Dash()

app.layout = html.Div([
    dcc.Graph(id='figure'),
    dbc.Button(id='btn-populate', children='Populate my graph!', n_clicks=0)
])


@app.callback(
    Output('figure', 'figure'),
    Input('btn-populate', 'n_clicks'),
    prevent_initial_callbacks=True
)
def populate_test_modin_df(n_clicks):
    if n_clicks > 0:
        fig = px.scatter(df, x=0, y=1)
        return fig
    else:
        raise PreventUpdate

if __name__ == '__main__':
    app.run_server(debug=False)

right, that’s what I mean. That’s what should not happen, and it does not when using standard pandas without modin. But when handling rather big data frames with changing columns, one would not want to address them by numbers but unique identifiers such as column names :smiley:

1 Like

It looks like modin problem, even though the column name has been set, it didn’t reflect. If you want to read csv faster, I think you can use dask.

yep → Pandas parallel package 'modin' does not work with plotly - Modin Discuss