Convert timeseries line chart to grouped by day

robertbyers1111 · August 11, 2024, 9:43pm

I have a timeseries data set covering ~10 days of upload and download speed measurements. I’ve created a plotly express line chart for the data with two lines (upload and download) with the x-axis covering all 10 days. I would rather categorize the chart by day, producing 20 lines (assuming 10 days). The x-axis would cover time-of-day (00:00:00 - 23:59:59) with no date. Each line would correspond to upload or download speed on a particular day.

The current, multi-day plot is produced by this code…

    import pandas as pd
    from pandas.core.frame import DataFrame
    import plotly.express as px
    from datetime import datetime

...

    # timestamp is a datetime object
    # download_mbps and upload_mbps are floats

    df_prep = []
    for data_object in data_objects:

        download_mbps = data_object.download.mbps if data_object.download.mbps > 0. else float('nan')
        upload_mbps = data_object.upload.mbps if data_object.upload.mbps > 0. else float('nan')
  
        df_prep.append({
            "timestamp": data_object.timestamp,
            "date": data_object.timestamp.date(),
            "time": data_object.timestamp.time(),
            "download_mbps": download_mbps,
            "upload_mbps": upload_mbps,
        })

    df = pd.DataFrame(df_prep)
    df.set_index('timestamp', inplace=True)
    fig = px.line(df, y=['download_mbps', 'upload_mbps'])
    fig.show()

currently only timestamp (which is a python datetime object) is used. The separate date and time columns are for the new grouping.

farispriadi · August 12, 2024, 2:29am

Hi @robertbyers1111 ,

Welcome to the community!

If I understand, you need to little bit doing preprocessing data using melt and pivot function that available in pandas.

It’s great if you will provide us some dummy data, to make easier reproduce the plot.

But in the code below I have generate data that I hope this is the result that you are looking for.

import plotly.express as px
import pandas as pd
import numpy as np
np.random.seed(1)

# generate timestamp in 10 days
datetime_generated = pd.date_range(start='2020-03-01', end='2020-03-12', freq="BH")


# create new dummy dataframe
df = pd.DataFrame(dict(
    timestamp = datetime_generated,
    date = datetime_generated.date,
    time = datetime_generated.time,
    download_mbps= 25 + (np.random.random_sample((len(datetime_generated),))*25),
    upload_mbps= 5 + (np.random.random_sample((len(datetime_generated),))*15)
))

# 1. Using pandas melt to unpivoting 'download_mps' and  'upload_mbps' columns as measured variables.
df = pd.melt(df, id_vars=['timestamp','time','date'], value_vars=['download_mbps','upload_mbps'])
print("df after melt")
print(df)

# 2. create new column as label that consist of the {date}download or {date}upload
# the values of the label will be used as label names in legend. 
df['category'] = df['date'].astype(str)+" "+df['variable']
print("df add new column 'category'")
print(df)

# 3. Use pandas pivot to create set new dataframe that:
# - indexed by time
# - categories by 'category' (upload/download by day)
# - and have values from value column which mean the the download or upload rate itself.
df = pd.pivot(df, values='value', index=['time'],columns='category')
print("df after pivot")
print(df)

# show in line chart
fig = px.line(df)
fig.show()

And here the plot.

Hope this help.

robertbyers1111 · August 12, 2024, 12:03pm

Thanks for this suggestion. It isn’t quite working for my data, I’m not sure why.

Here’s my code based on your solution, with my data for two days only.

import pandas as pd
from pandas.core.frame import DataFrame
import plotly.express as px
from datetime import datetime

df_prep = [
    {"timestamp": datetime.strptime("2024-08-07 09:10:38", "%Y-%m-%d %H:%M:%S"), "download_mbps": "14.671552", "upload_mbps": "6.439295"},
    {"timestamp": datetime.strptime("2024-08-07 10:50:29", "%Y-%m-%d %H:%M:%S"), "download_mbps": "12.499064", "upload_mbps": "10.97903"},
    {"timestamp": datetime.strptime("2024-08-07 12:20:28", "%Y-%m-%d %H:%M:%S"), "download_mbps": "20.843644", "upload_mbps": "8.760801"},
    {"timestamp": datetime.strptime("2024-08-07 14:20:32", "%Y-%m-%d %H:%M:%S"), "download_mbps": "8.1798401", "upload_mbps": "11.22654"},
    {"timestamp": datetime.strptime("2024-08-07 16:40:14", "%Y-%m-%d %H:%M:%S"), "download_mbps": "18.089695", "upload_mbps": "11.10735"},
    {"timestamp": datetime.strptime("2024-08-07 18:10:29", "%Y-%m-%d %H:%M:%S"), "download_mbps": "16.945190", "upload_mbps": "8.352405"},
    {"timestamp": datetime.strptime("2024-08-07 20:00:35", "%Y-%m-%d %H:%M:%S"), "download_mbps": "13.164570", "upload_mbps": "7.036316"},
    {"timestamp": datetime.strptime("2024-08-07 21:50:28", "%Y-%m-%d %H:%M:%S"), "download_mbps": "24.871938", "upload_mbps": "11.10282"},
    {"timestamp": datetime.strptime("2024-08-07 22:30:24", "%Y-%m-%d %H:%M:%S"), "download_mbps": "362.50311", "upload_mbps": "11.16578"},
    {"timestamp": datetime.strptime("2024-08-07 22:50:23", "%Y-%m-%d %H:%M:%S"), "download_mbps": "363.83068", "upload_mbps": "11.40599"},
    {"timestamp": datetime.strptime("2024-08-07 23:10:21", "%Y-%m-%d %H:%M:%S"), "download_mbps": "361.20357", "upload_mbps": "11.16105"},
    {"timestamp": datetime.strptime("2024-08-07 23:50:21", "%Y-%m-%d %H:%M:%S"), "download_mbps": "362.03622", "upload_mbps": "11.29156"},
    {"timestamp": datetime.strptime("2024-08-12 09:10:13", "%Y-%m-%d %H:%M:%S"), "download_mbps": "94.002145", "upload_mbps": "93.96362"},
    {"timestamp": datetime.strptime("2024-08-12 10:23:14", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.949398", "upload_mbps": "93.95668"},
    {"timestamp": datetime.strptime("2024-08-12 12:30:15", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.953006", "upload_mbps": "93.95661"},
    {"timestamp": datetime.strptime("2024-08-12 14:43:18", "%Y-%m-%d %H:%M:%S"), "download_mbps": "94.036565", "upload_mbps": "93.94661"},
    {"timestamp": datetime.strptime("2024-08-12 16:50:22", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.960951", "upload_mbps": "93.88965"},
    {"timestamp": datetime.strptime("2024-08-12 18:03:13", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.893522", "upload_mbps": "93.74555"},
    {"timestamp": datetime.strptime("2024-08-12 20:10:26", "%Y-%m-%d %H:%M:%S"), "download_mbps": "92.649269", "upload_mbps": "93.77410"},
    {"timestamp": datetime.strptime("2024-08-12 21:23:14", "%Y-%m-%d %H:%M:%S"), "download_mbps": "94.021073", "upload_mbps": "93.75004"},
    {"timestamp": datetime.strptime("2024-08-12 22:30:17", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.765045", "upload_mbps": "93.57514"},
    {"timestamp": datetime.strptime("2024-08-12 22:43:36", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.995060", "upload_mbps": "93.80222"},
    {"timestamp": datetime.strptime("2024-08-12 23:00:17", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.767352", "upload_mbps": "93.78429"},
    {"timestamp": datetime.strptime("2024-08-12 23:53:29", "%Y-%m-%d %H:%M:%S"), "download_mbps": "89.606162", "upload_mbps": "89.81360"}]

for i in range(len(df_prep)):
    df_prep[i]['date'] = df_prep[i]['timestamp'].date()
    df_prep[i]['time'] = df_prep[i]['timestamp'].time()

df = pd.DataFrame(df_prep)

print('df before melt')
print(df)

df = pd.melt(df, id_vars=['timestamp', 'time', 'date'], value_vars=['download_mbps', 'upload_mbps'])
print("df after melt")
print(df)

df['category'] = df['date'].astype(str) + " " + df['variable']
print("df add new column 'category'")
print(df)

df = pd.pivot(df, values='value', index=['time'], columns='category')
print("df after pivot")
print(df)

fig = px.line(df)
fig.show()

resulting in this…

farispriadi · August 12, 2024, 12:47pm

Hi @robertbyers1111 ,

The gaps between certain points in your line chart is caused by NaN values .
To connect the missing point you can add connectgaps argument in update_traces method.

...

fig = px.line(df)
fig.update_traces(connectgaps=True)
fig.show()

robertbyers1111 · August 12, 2024, 2:41pm

Hmm, why is my Y axis not increasing in numerical order? It seems rather to be organized by the order in which the Y values appear in the data. Even though it appears (visually) as though the lines increase over time, if you look at the Y values for each point, they aren’t always increasing - it only appears that way because the Y axis is arranged by order of appearance in the data, without regard to numerical value.

AIMPED · August 12, 2024, 2:54pm

Check your dtype in the dataframe. The float has been interpreted as string. You might also try to convert the y axis to numerical.

farispriadi · August 12, 2024, 3:01pm

Hi @robertbyers1111 ,

I see, this is because the data type of download_mbps and upload_mbps are in object not float.

Use astype to convert both column data type into float, after creating a dataframe.

...

df = pd.DataFrame(df_prep)

# add two lines below
df['download_mbps'] = df['download_mbps'].astype('float')
df['upload_mbps'] = df['upload_mbps'].astype('float')

print('df before melt')
print(df)

...

I guess the output below is what you expected.

robertbyers1111 · August 12, 2024, 3:53pm

Thanks much! That works perfectly!

Topic		Replies	Views
Cannot get times to overlap when using Multiple line graph for daily internet speeds 📊 Plotly Python	5	1386	April 12, 2020
Difficulty in plotting time series data 📊 Plotly Python question	4	541	April 26, 2022
9 different plots show data from the same day	4	952	November 1, 2015
Line Charts with 120 sec. entries won't work 📊 Plotly Python	4	270	June 17, 2021
Pandas Dataframe Grouping data on weekly basis 📊 Plotly Python	0	414	June 9, 2020

Convert timeseries line chart to grouped by day

Related topics