Headers as years in multiple columns. How to setup a line chart?

mocoroh · February 23, 2023, 9:11pm

Hi, everyone.
I am a newbie in Plotly and Python and also have a very tricky CSV to handle. Please have a look at an excerpt of it below:

Nível;Indicador;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017;2018;2019;2020;2021;Unidade
1;Ensino básico;;;;;;;;;;;;;;;
1.1;Matrículas;;;;;;;;;;;;;;;
1.1.1;Ensino infantil;45801;49776;46854;49331;50081;52113;51632;47860;49984;50084;51282;51589;48184;41608;matrículas
1.1.1.1;Creche;10709;11147;11187;13196;14677;17289;17692;17207;18095;18438;18745;18817;16645;14253;matrículas
1.1.1.1.1;Municipal;4844;5087;4382;4370;5015;4991;4955;5110;5386;5737;5887;6056;6439;6750;matrículas
1.1.1.1.2;Estadual;7;;24;17;4;11;17;;;;;;;;matrículas
1.1.1.1.3;Federal;;;;;;;;;;;;;;;matrículas
1.1.1.1.4;Privado;5858;6060;6781;8809;9658;12287;12720;12097;12709;12701;12858;12761;10206;7503;matrículas
1.1.1.2;Pré-escolar;35092;38629;35667;36135;35404;34824;33940;30653;31889;31646;32537;32772;31539;27355;matrículas
1.1.1.2.1;Municipal;11904;11537;11199;10941;10695;10027;9853;10048;11060;11171;11635;12379;13224;13692;matrículas
1.1.1.2.2;Estadual;581;606;518;415;212;151;147;109;69;50;68;89;95;99;matrículas
1.1.1.2.3;Federal;;;;;;;;;;;;;;;matrículas

I need to generate a line chart with some items from the INDICADOR column to show values from the columns 2008 to 2021. Usually the years are in a sole column, so that would be easy. I have started with this:

import pandas as pd
import plotly.offline as pyo
import plotly.graph_objs as go

linhas_imp = [0,3,14,60] # here I am picking the rows I want

df = pd.read_csv(‘CensoEscolarSinopseREC.csv’, sep = ‘;’, skiprows = lambda x: x not in linhas_imp) #is this correct?
df = df.loc[:, ~df.columns.isin([‘Nível’,‘Unidade’])] #excluding the columns I do not need to use

And now I am quite confused on setting up the:

data = [go.Scatter(x=???,
y=???,
mode=‘lines’,
name=???
)]
pyo.plot(data)

Could anyone give me some advice? I have tried to find exhaustively on Google but did not find any similar case of CSV layout. THANKS!!!

AIMPED · February 23, 2023, 9:43pm

HI @mocoroh welcome to the forums.

Your CSV looks quite normal to me, there must be something I’m missing.

based on what do you select the rows?
do you always exclude the same columns?
what are the values you want to be displayed on the x- and y axis?

EDIT:

import pandas as pd
import plotly.express as px

df = pd.read_csv('extract.txt', delimiter=';')

fig = px.line(
    x=df.columns[2:-2], 
    y=df.iloc[2][2:-2], 
    title=df.iloc[2][1]
)
fig.show()

creates:

Note, that the values on the x-axis are not ordered because I plotted just the column names…

mocoroh · February 23, 2023, 11:43pm

Hi AIMPED!
Thank you very much for the prompt response. It worked great. I would be very glad if you could show how to disclosure 3 or 4 lines (e.g. including the rows CRECHE, MUNICIPAL, ESTADUAL) and using plotly.offline plotly.graph_objs, since these have been the packages I am studying at the moment (because I am going towards Dash).

Thanks again for your time!!

AIMPED · February 24, 2023, 8:53pm

Hey @mocoroh ,

there are a lot of different ways to approach this. My fisrt post shows the quick & dirty version. I prefer cleaning the dataframe bofore plotting figures. This is the major part of the following code:

import pandas as pd
import plotly.graph_objects as go

# read csv file (I renamed the two items "Municipal" in the column "Indicador")
df = pd.read_csv('extract.txt', delimiter=';')

# delete unwanted columns
cleaned = df.drop(['Nível', 'Unidade'], axis=1)

# transpose (switch columns/rows)
cleaned = cleaned.transpose(copy=True)

#reset index
cleaned.reset_index(inplace=True)

# use first row of DataFrame as header
cleaned.columns = cleaned.iloc[0]

# delete first row
cleaned = cleaned.iloc[1:]

# covert types
cleaned = cleaned.astype(float)
cleaned.Indicador = pd.to_datetime(cleaned.Indicador, format='%Y')

# now the easy part --> create the figure
fig = go.Figure()

# add each column as trace
for column in ['Ensino infantil', 'Creche', 'Municipal_1']:
    fig.add_scatter(x=cleaned['Indicador'], y=cleaned[column], mode='markers+lines', name=column)

# show figure
fig.show()

creates:
newplot(49)

mocoroh · February 27, 2023, 8:05pm

Thanks, AIMPED! It makes total sense. You also brought some new content to me, such as the transpose and astype. Lot to learn, yet… Thanks again!

Topic		Replies	Views
Put multiple columns in scatter 3d 📊 Plotly Python	0	534	June 20, 2020
Basic Plotly Question about Reading from CSV 📊 Plotly Python	4	3036	July 31, 2020
How to use multiple columns on x_axis 📊 Plotly Python	8	7642	May 16, 2022
Plot data from csv file with changeable columns using Plotly 📊 Plotly Python	0	365	July 17, 2020
CSV file has been rearranged Dash Python	8	498	November 9, 2022

Headers as years in multiple columns. How to setup a line chart?

Related Topics