Convert timeseries line chart to grouped by day

I have a timeseries data set covering ~10 days of upload and download speed measurements. I’ve created a plotly express line chart for the data with two lines (upload and download) with the x-axis covering all 10 days. I would rather categorize the chart by day, producing 20 lines (assuming 10 days). The x-axis would cover time-of-day (00:00:00 - 23:59:59) with no date. Each line would correspond to upload or download speed on a particular day.

The current, multi-day plot is produced by this code…

    import pandas as pd
    from pandas.core.frame import DataFrame
    import plotly.express as px
    from datetime import datetime

...

    # timestamp is a datetime object
    # download_mbps and upload_mbps are floats

    df_prep = []
    for data_object in data_objects:

        download_mbps = data_object.download.mbps if data_object.download.mbps > 0. else float('nan')
        upload_mbps = data_object.upload.mbps if data_object.upload.mbps > 0. else float('nan')
  
        df_prep.append({
            "timestamp": data_object.timestamp,
            "date": data_object.timestamp.date(),
            "time": data_object.timestamp.time(),
            "download_mbps": download_mbps,
            "upload_mbps": upload_mbps,
        })

    df = pd.DataFrame(df_prep)
    df.set_index('timestamp', inplace=True)
    fig = px.line(df, y=['download_mbps', 'upload_mbps'])
    fig.show()

currently only timestamp (which is a python datetime object) is used. The separate date and time columns are for the new grouping.

Hi @robertbyers1111 ,

Welcome to the community!

If I understand, you need to little bit doing preprocessing data using melt and pivot function that available in pandas.

It’s great if you will provide us some dummy data, to make easier reproduce the plot.

But in the code below I have generate data that I hope this is the result that you are looking for.

import plotly.express as px
import pandas as pd
import numpy as np
np.random.seed(1)

# generate timestamp in 10 days
datetime_generated = pd.date_range(start='2020-03-01', end='2020-03-12', freq="BH")


# create new dummy dataframe
df = pd.DataFrame(dict(
    timestamp = datetime_generated,
    date = datetime_generated.date,
    time = datetime_generated.time,
    download_mbps= 25 + (np.random.random_sample((len(datetime_generated),))*25),
    upload_mbps= 5 + (np.random.random_sample((len(datetime_generated),))*15)
))

# 1. Using pandas melt to unpivoting 'download_mps' and  'upload_mbps' columns as measured variables.
df = pd.melt(df, id_vars=['timestamp','time','date'], value_vars=['download_mbps','upload_mbps'])
print("df after melt")
print(df)

# 2. create new column as label that consist of the {date}download or {date}upload
# the values of the label will be used as label names in legend. 
df['category'] = df['date'].astype(str)+" "+df['variable']
print("df add new column 'category'")
print(df)

# 3. Use pandas pivot to create set new dataframe that:
# - indexed by time
# - categories by 'category' (upload/download by day)
# - and have values from value column which mean the the download or upload rate itself.
df = pd.pivot(df, values='value', index=['time'],columns='category')
print("df after pivot")
print(df)

# show in line chart
fig = px.line(df)
fig.show()

And here the plot.

Hope this help.

3 Likes

Thanks for this suggestion. It isn’t quite working for my data, I’m not sure why.

Here’s my code based on your solution, with my data for two days only.

import pandas as pd
from pandas.core.frame import DataFrame
import plotly.express as px
from datetime import datetime

df_prep = [
    {"timestamp": datetime.strptime("2024-08-07 09:10:38", "%Y-%m-%d %H:%M:%S"), "download_mbps": "14.671552", "upload_mbps": "6.439295"},
    {"timestamp": datetime.strptime("2024-08-07 10:50:29", "%Y-%m-%d %H:%M:%S"), "download_mbps": "12.499064", "upload_mbps": "10.97903"},
    {"timestamp": datetime.strptime("2024-08-07 12:20:28", "%Y-%m-%d %H:%M:%S"), "download_mbps": "20.843644", "upload_mbps": "8.760801"},
    {"timestamp": datetime.strptime("2024-08-07 14:20:32", "%Y-%m-%d %H:%M:%S"), "download_mbps": "8.1798401", "upload_mbps": "11.22654"},
    {"timestamp": datetime.strptime("2024-08-07 16:40:14", "%Y-%m-%d %H:%M:%S"), "download_mbps": "18.089695", "upload_mbps": "11.10735"},
    {"timestamp": datetime.strptime("2024-08-07 18:10:29", "%Y-%m-%d %H:%M:%S"), "download_mbps": "16.945190", "upload_mbps": "8.352405"},
    {"timestamp": datetime.strptime("2024-08-07 20:00:35", "%Y-%m-%d %H:%M:%S"), "download_mbps": "13.164570", "upload_mbps": "7.036316"},
    {"timestamp": datetime.strptime("2024-08-07 21:50:28", "%Y-%m-%d %H:%M:%S"), "download_mbps": "24.871938", "upload_mbps": "11.10282"},
    {"timestamp": datetime.strptime("2024-08-07 22:30:24", "%Y-%m-%d %H:%M:%S"), "download_mbps": "362.50311", "upload_mbps": "11.16578"},
    {"timestamp": datetime.strptime("2024-08-07 22:50:23", "%Y-%m-%d %H:%M:%S"), "download_mbps": "363.83068", "upload_mbps": "11.40599"},
    {"timestamp": datetime.strptime("2024-08-07 23:10:21", "%Y-%m-%d %H:%M:%S"), "download_mbps": "361.20357", "upload_mbps": "11.16105"},
    {"timestamp": datetime.strptime("2024-08-07 23:50:21", "%Y-%m-%d %H:%M:%S"), "download_mbps": "362.03622", "upload_mbps": "11.29156"},
    {"timestamp": datetime.strptime("2024-08-12 09:10:13", "%Y-%m-%d %H:%M:%S"), "download_mbps": "94.002145", "upload_mbps": "93.96362"},
    {"timestamp": datetime.strptime("2024-08-12 10:23:14", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.949398", "upload_mbps": "93.95668"},
    {"timestamp": datetime.strptime("2024-08-12 12:30:15", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.953006", "upload_mbps": "93.95661"},
    {"timestamp": datetime.strptime("2024-08-12 14:43:18", "%Y-%m-%d %H:%M:%S"), "download_mbps": "94.036565", "upload_mbps": "93.94661"},
    {"timestamp": datetime.strptime("2024-08-12 16:50:22", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.960951", "upload_mbps": "93.88965"},
    {"timestamp": datetime.strptime("2024-08-12 18:03:13", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.893522", "upload_mbps": "93.74555"},
    {"timestamp": datetime.strptime("2024-08-12 20:10:26", "%Y-%m-%d %H:%M:%S"), "download_mbps": "92.649269", "upload_mbps": "93.77410"},
    {"timestamp": datetime.strptime("2024-08-12 21:23:14", "%Y-%m-%d %H:%M:%S"), "download_mbps": "94.021073", "upload_mbps": "93.75004"},
    {"timestamp": datetime.strptime("2024-08-12 22:30:17", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.765045", "upload_mbps": "93.57514"},
    {"timestamp": datetime.strptime("2024-08-12 22:43:36", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.995060", "upload_mbps": "93.80222"},
    {"timestamp": datetime.strptime("2024-08-12 23:00:17", "%Y-%m-%d %H:%M:%S"), "download_mbps": "93.767352", "upload_mbps": "93.78429"},
    {"timestamp": datetime.strptime("2024-08-12 23:53:29", "%Y-%m-%d %H:%M:%S"), "download_mbps": "89.606162", "upload_mbps": "89.81360"}]

for i in range(len(df_prep)):
    df_prep[i]['date'] = df_prep[i]['timestamp'].date()
    df_prep[i]['time'] = df_prep[i]['timestamp'].time()

df = pd.DataFrame(df_prep)

print('df before melt')
print(df)

df = pd.melt(df, id_vars=['timestamp', 'time', 'date'], value_vars=['download_mbps', 'upload_mbps'])
print("df after melt")
print(df)

df['category'] = df['date'].astype(str) + " " + df['variable']
print("df add new column 'category'")
print(df)

df = pd.pivot(df, values='value', index=['time'], columns='category')
print("df after pivot")
print(df)

fig = px.line(df)
fig.show()

resulting in this…

Hi @robertbyers1111 ,

The gaps between certain points in your line chart is caused by NaN values .
To connect the missing point you can add connectgaps argument in update_traces method.

...

fig = px.line(df)
fig.update_traces(connectgaps=True)
fig.show()

2 Likes

Hmm, why is my Y axis not increasing in numerical order? It seems rather to be organized by the order in which the Y values appear in the data. Even though it appears (visually) as though the lines increase over time, if you look at the Y values for each point, they aren’t always increasing - it only appears that way because the Y axis is arranged by order of appearance in the data, without regard to numerical value.

Check your dtype in the dataframe. The float has been interpreted as string. You might also try to convert the y axis to numerical.

1 Like

Hi @robertbyers1111 ,

I see, this is because the data type of download_mbps and upload_mbps are in object not float.

Use astype to convert both column data type into float, after creating a dataframe.

...

df = pd.DataFrame(df_prep)

# add two lines below
df['download_mbps'] = df['download_mbps'].astype('float')
df['upload_mbps'] = df['upload_mbps'].astype('float')

print('df before melt')
print(df)

...

I guess the output below is what you expected.

3 Likes

Thanks much! That works perfectly!

2 Likes