How to make Dash app run faster if its slowed by large data imported

sanae · May 17, 2022, 1:38pm

I have a large data stored in CSV file (1.7G)

My Dash app is too slow , take minutes to upload and refresh and run

I tried dcc.store But when I change value in my dropdown , it takes a lot of time to change my graphs

Is there any solution ? ?? ( I have lot of pages in my app ! )

nuno5645 · May 17, 2022, 2:00pm

so i had the same problem and one thing helped quite a bit was a function i found online, u just have to give your dataframe as an argument and assign it to a new variable.



def reduce_mem_usage(df):
    """ iterate through all the columns of a dataframe and modify the data type
        to reduce memory usage.        
    """
    start_mem = df.memory_usage().sum() / 1024**2
    #print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() / 1024**2
    
    
    return df

u could also try read your csv with the engine set to c like this


same_bed_dataframe = pd.read_csv(values, header=None, delimiter='\t', engine='c')

and my last tip is to use for example Scattergl instead of normal Scatter if that’s the case .

tcbegley · May 17, 2022, 3:28pm

You can make the initial data load faster if you save the data using a binary format such as feather or parquet rather than CSV. CSV is text based so when you load it, pandas has to parse all of the strings and convert to numeric values which takes time. Should be as easy as saving your data using df.to_parquet or df.to_feather and then replacing pd.read_csv with pd.read_parquet or pd.read_feather respectively.

As for speeding up the plotting function, it depends what you’re doing. If you provide some more details we might be able to offer some suggestions. You might be able to pre-compute some of the things you want to plot and save some time that way if you’re computing them from the raw data in the callback each time.

sanae · May 18, 2022, 9:34am

Hey @nuno5645 thank you for reply …
I use your code :

individu = pd.read_csv(DATA_PATH.joinpath("individu_final.csv",))
# Reduce memory usage :
def reduce_mem_usage(df):
    """ iterate through all the columns of a dataframe and modify the data type
        to reduce memory usage.        
    """
    start_mem = df.memory_usage().sum() / 1024**2
    #print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() / 1024**2
    
    
    return df
reduce_mem_usage(individu)

I cant use engine C with DATA_PATH.joinpath

My App still working slow

nuno5645 · May 18, 2022, 9:36am

could you show your code so i can better see the issue

sanae · May 18, 2022, 9:42am

This is the structure of my App
I’ve 12 pages in rapports_adultes , rappoets_agees …

I save dataa using pd.read_csv in individus and I share it in other pages

nuno5645 · May 18, 2022, 9:48am

if you add it this way wouldn’t it work?


individu = pd.read_csv(DATA_PATH.joinpath("individu_final.csv"), engine = 'c')

sanae · May 18, 2022, 10:07am

It’s working ! , But when I reload the app in browser , its soo slow

sanae · May 18, 2022, 10:10am

thank you @tcbegley , I trying to use your idea (df.to_parquet)

in myApp , I have a lot of pages having plots … Plots called by callbacks in using dropdowns or radoiItems

nuno5645 · May 18, 2022, 10:18am

if you u want to speed things up u could try and use something like dask, to read your csv alot faster than pandas. it has the same syntax as dask just a little less features

nuno5645 · May 18, 2022, 10:19am

if u want some help i could analyze your code and suggest some performance tips

siglap · January 17, 2023, 7:05am

Hi @nuno5645 , how can I contact you? I need some help from you regarding plotly dash app and am happy to compensate for your time.

sanae · January 17, 2023, 9:24am

In tis project , I tried using vax
thank you

Topic		Replies	Views
Performance: How to increase loading speed when access dcc.Store data from callback Dash Python question	3	1708	October 28, 2024
Speeding up the dash Dash Python	1	490	February 13, 2020
Dash app slow once deployed on a virtual machine 📊 Plotly Python	0	965	November 10, 2019
How to speed up backend data load in Dash? Dash Python	5	3102	May 7, 2020
Improving App Performance [ Help Please ] Dash Python	10	1861	December 11, 2021

How to make Dash app run faster if its slowed by large data imported

Related topics