ziczac
November 21, 2019, 4:31am
1
Hi. I am ussing html to download data from a dashboard.
csv_string = df.to_csv(index=True, encoding=‘utf-8’)
csv_string = “data:text/csv;charset=utf-8,” + urllib.parse.quote(csv_string)
return html.A(children=‘download data’, id=f’dl_bar_{metric}’, download=f"spx_markouts_{metric}by {’_’.join(groups)}.csv", href=csv_string)
However, the data can be quite large, with the csv_string being over 1mn characters. That seems to result in network errors.
Is there a way to increase the html length allowed or is there another way to download data?
Could you use one of the compression modes of df.to_csv
to reduce the size of the data? See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html , compression
parameter
ziczac
November 22, 2019, 1:57am
3
Thanks for your reply. I tried your proposal, but it seems that to_csv does not support in-memory compression, it only appears to work if a file handle or file is passed as input.
RuntimeWarning: compression has no effect when passing file-like object as input.
I thing the issue is described here
opened 05:01PM - 31 Aug 18 UTC
closed 11:53AM - 07 Aug 20 UTC
Enhancement
IO CSV
#### Code Sample, a copy-pastable example if possible
```python
# Attempt 1
…
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3, 4], "B": [5, 6, 7, 8], "C": [9, 10, 11, 12]})
test = df.to_csv(compression="gzip")
type(test)
```
```
RuntimeWarning: compression has no effect when passing file-like object as input.
Out: str
```
```python
# Attempt 2
from io import BytesIO
b_buf = BytesIO()
df.to_csv(b_buf, compression="gzip")
```
```
Out: TypeError: a bytes-like object is required, not 'str'
```
#### Problem description
I am trying to gzip compress a dataframe in memory (as opposed to directly to a named file location). The use case for this is (I imagine) similar to the reason by to_csv now allows not specifying a path in other cases to create an in memory representation, but specifically my case is that I need to save the compressed df to a cloud location using a custom URI, and I'm temporarily keeping it in memory for that purpose.
#### Expected Output
I would expect the compression option to result in a compressed, bytes object (similar to the gzip library).
Thank you in advance for your help!
Note: I originally saw #21227 (df.to_csv ignores compression when provided with a file handle), and thought it might have also been a fix, but looks like it just stopped a little short of fixing my issue as well.
#### Output of ``pd.show_versions()``
<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
</details>
So, I went ahead and naively tried to save to local find and download that
def get_dl_link(df, metric, groups):
filename=f"spx_markouts_{metric}_by_{'_'.join(groups)}.csv"
df.to_csv(filename, index=True, encoding='utf-8', compression='gzip')
return html.A(children='download data', id=f'dl_bar_{metric}', download=filename)
This creates the local file, but clicking on the link wouldn’t download anything
I also tried to compress myself using bz2
csv_string = csv_string.encode('utf-8')
compressed = bz2.compress(csv_string)
return html.A(children='download datVya', id=f'dl_bar_{metric}', download=f"spx_markouts_{metric}_by_{'_'.join(groups)}.bz2", href=compressed)
However this throws
TypeError: Object of type ‘bytes’ is not JSON serializable