Black Lives Matter. Please consider donating to Black Girls Code today.

Line graph with monthly data points over multiple years

Hey there!

I am scrolling this forum for months and I always found an answer to my problem.

However, I am pretty lost on this one:
I want to display monthly data over multiple years, with monthly “ticks” for each line:

This is what I want, except, the time data is coming from a timestamp, therefore not only one tick per month is plotted, but instead I get multiple data point for each month, resulting in a very nasty looking graph, which is not readable.

What I want: Aggregated data for each month, with multiple years. So for e.g. 200 data points in january, I only want one aggregated value to be plotted. I got this working for one year (using the groupby of my pandas df) with the monthly values. But for multiple years, this isn’t working (since 2017/2018 data for January would be summed together as one.

This is my code:

layout = go.Layout(
            title='',
            showlegend=True,
            xaxis=dict(
                title= 'Month/Year',
                tickformat = '%b/%y',
                range = [min,max], # min/max are a datetime value
                type='date'
            ),
            yaxis=dict(
                title= ''
            )
        )
data = [
    go.Bar(
        x=data.loc[data['SomeKey'].isin(['SomeValue'])]['Date (Key)'], # Date (Key) is a datetime
        y=data.loc[data['SomeKey'].isin(['SomeValue'])]['Sum'],
        name = 'SomeName'
    )]
fig = dict( data = data, layout=layout )

I thought this must be done with the transform key of plot.ly, but I could not get it to work.

Any help is greatly appreciated!

1 Like

Hi @georg90,

I would recommend tackling this in pandas. This is a great use-case for the pandas.Grouper object (See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Grouper.html).

Try something like this:

import pandas as pd
per_month = data.set_index('Date (Key)').groupby(pd.Grouper(freq='M'))['SomeValue'].mean()

x = per_month.index
y = per_month.values

The 'M' string stands for “Month end frequency”, but there are a ton of other options. See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases.

The .mean() aggregate function will average the values in the month. If you don’t want this, replace it with a difference aggregate function (min, max, count, etc.)

Hope that helps!
-Jon

1 Like

Hi Jon,

thanks a lot for the reply. I got it working using pd.Grouper - thanks!

Using this, I get the intended result:

df = df.reset_index().set_index(pd.DatetimeIndex(req['Date (Key)'])).groupby(['SomeKey',pd.Grouper(freq='M')])['SomeValue'].count().reset_index(name='Val')
1 Like