Error while reading CSV file with pd.read_csv

Hi there!

I am in troubles with reading a CSV via https. Following code is NOT working:

filedata = 'https://github.com/pcm-dpc/COVID-19/blob/master/dati-province/dpc-covid19-ita-province-latest.csv'
import pandas as pd
df = pd.read_csv(filedata,
                   dtype={"denominazione_provincia": str})

While following code IS working (taken from here :

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
                   dtype={"fips": str})

This is the error msg I getโ€ฆ Anyone can help?


Traceback (most recent call last):

  File "<ipython-input-105-19a69f730c06>", line 4, in <module>
    dtype={"denominazione_provincia": str})

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 454, in _read
    data = parser.read(nrows)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1133, in read
    ret = self._engine.read(nrows)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 2037, in read
    data = self._reader.read(nrows)

  File "pandas\_libs\parsers.pyx", line 859, in pandas._libs.parsers.TextReader.read

  File "pandas\_libs\parsers.pyx", line 874, in pandas._libs.parsers.TextReader._read_low_memory

  File "pandas\_libs\parsers.pyx", line 928, in pandas._libs.parsers.TextReader._read_rows

  File "pandas\_libs\parsers.pyx", line 915, in pandas._libs.parsers.TextReader._tokenize_rows

  File "pandas\_libs\parsers.pyx", line 2070, in pandas._libs.parsers.raise_parser_error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 49, saw 2

Hi @Alex79 welcome to the forum! This is clearly a pandas question so you would probably get more help on Stackoverflow (the pandas github says that pandas questions should be asked on Stackoverflow). That said, the error message says that there is a problem with your csv file. You should download it, open it in a text editor and see whatโ€™s wrong with line 49 (or more generally inspect the structure of the file). Good luck!

Thank you Emmanuelle 4 welcoming and replying!
Iโ€™ll try your suggestion asap.
Best, A.

In most cases, it might be an issue with:

  • the delimiters in your data.
  • confused by the headers/column of the file.

To solve pandas.parser.CParserError: Error tokenizing data , try specifying the sep and/or header arguments when calling read_csv.

pandas.read_csv(fileName, sep='you_delimiter', header=None)

In some cases, the pandas.parser.CParserError generated when reading a file written by pandas.to_csv(), it might be because there is a carriage return (โ€˜\rโ€™) in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, it will cause a difference between the number of columns in the first X rows. This difference is one cause of the CParserError .

Also, the Error tokenizing data may arise when youโ€™re using separator (for eg. comma โ€˜,โ€™) as a delimiter and you have more separator than expected (more fields in the error row than defined in the header). So you need to either remove the additional field or remove the extra separator if itโ€™s there by mistake. The better solution is to investigate the offending file and to fix it manually so you donโ€™t need to skip the error lines.

hi @walemark
Welcome to the community and thank you for helping @Alex79.