so i had the same problem and one thing helped quite a bit was a function i found online, u just have to give your dataframe as an argument and assign it to a new variable.
def reduce_mem_usage(df):
""" iterate through all the columns of a dataframe and modify the data type
to reduce memory usage.
"""
start_mem = df.memory_usage().sum() / 1024**2
#print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
for col in df.columns:
col_type = df[col].dtype
if col_type != object:
c_min = df[col].min()
c_max = df[col].max()
if str(col_type)[:3] == 'int':
if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
df[col] = df[col].astype(np.int8)
elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
df[col] = df[col].astype(np.int16)
elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
df[col] = df[col].astype(np.int32)
elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
df[col] = df[col].astype(np.int64)
else:
if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
df[col] = df[col].astype(np.float16)
elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
df[col] = df[col].astype(np.float32)
else:
df[col] = df[col].astype(np.float64)
else:
df[col] = df[col].astype('category')
end_mem = df.memory_usage().sum() / 1024**2
return df
u could also try read your csv with the engine set to c like this
You can make the initial data load faster if you save the data using a binary format such as feather or parquet rather than CSV. CSV is text based so when you load it, pandas has to parse all of the strings and convert to numeric values which takes time. Should be as easy as saving your data using df.to_parquet or df.to_feather and then replacing pd.read_csv with pd.read_parquet or pd.read_feather respectively.
As for speeding up the plotting function, it depends what you’re doing. If you provide some more details we might be able to offer some suggestions. You might be able to pre-compute some of the things you want to plot and save some time that way if you’re computing them from the raw data in the callback each time.
if you u want to speed things up u could try and use something like dask, to read your csv alot faster than pandas. it has the same syntax as dask just a little less features