I have been plotting data from a .csv file where the y-axis data is listed in a column of 1051 rows, and basically the x-axis is the index. Now the developers that I am writing the dashboard for have changed their .csv format so that the y-axis values are all in a single row, with each value in it’s own column, for a total of 1051 columns (in addition to a few other columns). The 1051 columns are arranged from lowest x-axis value to highest x-axis value. After I get this into a df, I can’t figure out how to plot it. Can anyone help me?
Hi @majudd,
maybe @adamschroeder’s response can help you restructuring your DataFrame:
Hmm, that’s interesting. I am now looking into melt, that is new to me. But so far, I am not seeing how this can help me to plot a curve with 1051 values. I will play around with it though!
You may be able to use some transpose functionality from pandas.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html
My column names are ‘x-1’, ‘x-2’, all the way up to ‘x-1051’, then the values in each column are the y-axis values I want to plot.
Thanks, I will look into transpose!
What kind of plot we are talking about? Would just switching the x and y arguments do the trick? As I understand you have 1051 columns and 1 row.
so assuming you want to do a scatter plot:
px.scatter(x=[*range(1, 1052)], y=df.iloc[0])
But I guess I do not fully understand the problem you are facing.
I want to do a line plot. It was easy to do when all of the y data was in a single column, and the x data was just the index.
I have one column, called “score”. This is my y-value. I have hundreds of curves in this one column, each 1051 rows long, listed one after the other. Each curve is in the correct order, from x=1 to x=1051 (listed in another column). There is a time stamp in another column that helps me to separate one curve from another, if I want to plot them as separate lines (each curve has a unique time stamp). I also plot everything together in a box plots, with the timestamp on the x axis and the score as the y-axis. Now my data has been rearranged by the people I am working for to have one row per time stamp, and all of the scores in 1051 columns in the same row. This is saving space, the .csv file I use to pull in the data is much smaller. But I am having trouble figuring out how to pull in all of the data for the box plot and how to plot individual curves for each time stamp.