Headers as years in multiple columns. How to setup a line chart?

Hi, everyone.
I am a newbie in Plotly and Python and also have a very tricky CSV to handle. Please have a look at an excerpt of it below:

NĂ­vel;Indicador;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017;2018;2019;2020;2021;Unidade
1;Ensino bĂĄsico;;;;;;;;;;;;;;;
1.1;MatrĂ­culas;;;;;;;;;;;;;;;
1.1.1;Ensino infantil;45801;49776;46854;49331;50081;52113;51632;47860;49984;50084;51282;51589;48184;41608;matrĂ­culas
1.1.1.1;Creche;10709;11147;11187;13196;14677;17289;17692;17207;18095;18438;18745;18817;16645;14253;matrĂ­culas
1.1.1.1.1;Municipal;4844;5087;4382;4370;5015;4991;4955;5110;5386;5737;5887;6056;6439;6750;matrĂ­culas
1.1.1.1.2;Estadual;7;;24;17;4;11;17;;;;;;;;matrĂ­culas
1.1.1.1.3;Federal;;;;;;;;;;;;;;;matrĂ­culas
1.1.1.1.4;Privado;5858;6060;6781;8809;9658;12287;12720;12097;12709;12701;12858;12761;10206;7503;matrĂ­culas
1.1.1.2;Pré-escolar;35092;38629;35667;36135;35404;34824;33940;30653;31889;31646;32537;32772;31539;27355;matrículas
1.1.1.2.1;Municipal;11904;11537;11199;10941;10695;10027;9853;10048;11060;11171;11635;12379;13224;13692;matrĂ­culas
1.1.1.2.2;Estadual;581;606;518;415;212;151;147;109;69;50;68;89;95;99;matrĂ­culas
1.1.1.2.3;Federal;;;;;;;;;;;;;;;matrĂ­culas

I need to generate a line chart with some items from the INDICADOR column to show values from the columns 2008 to 2021. Usually the years are in a sole column, so that would be easy. I have started with this:

import pandas as pd
import plotly.offline as pyo
import plotly.graph_objs as go

linhas_imp = [0,3,14,60] # here I am picking the rows I want

df = pd.read_csv(‘CensoEscolarSinopseREC.csv’, sep = ‘;’, skiprows = lambda x: x not in linhas_imp) #is this correct?
df = df.loc[:, ~df.columns.isin([‘Nível’,‘Unidade’])] #excluding the columns I do not need to use

And now I am quite confused on setting up the:

data = [go.Scatter(x=???,
y=???,
mode=‘lines’,
name=???
)]
pyo.plot(data)

Could anyone give me some advice? I have tried to find exhaustively on Google but did not find any similar case of CSV layout. THANKS!!!

HI @mocoroh welcome to the forums.

Your CSV looks quite normal to me, there must be something I’m missing.

  • based on what do you select the rows?
  • do you always exclude the same columns?
  • what are the values you want to be displayed on the x- and y axis?

EDIT:

import pandas as pd
import plotly.express as px

df = pd.read_csv('extract.txt', delimiter=';')

fig = px.line(
    x=df.columns[2:-2], 
    y=df.iloc[2][2:-2], 
    title=df.iloc[2][1]
)
fig.show()

creates:

Note, that the values on the x-axis are not ordered because I plotted just the column names


1 Like

Hi AIMPED!
Thank you very much for the prompt response. It worked great. I would be very glad if you could show how to disclosure 3 or 4 lines (e.g. including the rows CRECHE, MUNICIPAL, ESTADUAL) and using plotly.offline plotly.graph_objs, since these have been the packages I am studying at the moment (because I am going towards Dash).

Thanks again for your time!!

Hey @mocoroh ,

there are a lot of different ways to approach this. My fisrt post shows the quick & dirty version. I prefer cleaning the dataframe bofore plotting figures. This is the major part of the following code:

import pandas as pd
import plotly.graph_objects as go

# read csv file (I renamed the two items "Municipal" in the column "Indicador")
df = pd.read_csv('extract.txt', delimiter=';')

# delete unwanted columns
cleaned = df.drop(['NĂ­vel', 'Unidade'], axis=1)

# transpose (switch columns/rows)
cleaned = cleaned.transpose(copy=True)

#reset index
cleaned.reset_index(inplace=True)

# use first row of DataFrame as header
cleaned.columns = cleaned.iloc[0]

# delete first row
cleaned = cleaned.iloc[1:]

# covert types
cleaned = cleaned.astype(float)
cleaned.Indicador = pd.to_datetime(cleaned.Indicador, format='%Y')

# now the easy part --> create the figure
fig = go.Figure()

# add each column as trace
for column in ['Ensino infantil', 'Creche', 'Municipal_1']:
    fig.add_scatter(x=cleaned['Indicador'], y=cleaned[column], mode='markers+lines', name=column)

# show figure
fig.show()

creates:
newplot(49)

1 Like

Thanks, AIMPED! It makes total sense. You also brought some new content to me, such as the transpose and astype. Lot to learn, yet
 Thanks again!

1 Like