Black Lives Matter. Please consider donating to Black Girls Code today.

Multiple choice questions and histograms

newbie question again… I’d say others have done this loads of times…

We have survey results and we are charting them.
One of the questions is multiple choice. (Q3 in below example)

So, we have a row for each response in the dataframe, in the answer column for the multiple choice question, Q3, we get strings of numbers representing the choices the user picked.

So the data in the csv file might look a bit like this:

date, Q1, Q2, Q3
xxx, 4, 6, “4,10,12”

I need to make a histogram of the answers in Q3.
Not sure what way to go about this… is there anything built-in to the histogram to cater for this shape of data?
Or do I have to do something myself to pivot the data some way in python?

Thanks in advance.

So, I found the following code which seems to do the job:

def splitDataFrameList(df,target_column,separator):
    ''' df = dataframe to split,
    target_column = the column containing the values to split
    separator = the symbol used to perform the split
    returns: a dataframe with each entry for the target column separated, with each element moved into a new row. 
    The values in the other columns are duplicated across the newly divided rows.
    row_accumulator = []

    def splitListToRows(row, separator):
        split_row = row[target_column].split(separator)
        for s in split_row:
            new_row = row.to_dict()
            new_row[target_column] = s

    df.apply(splitListToRows, axis=1, args = (separator, ))
    new_df2 = pd.DataFrame(row_accumulator)
    return new_df2

You call it like this:

newer_df = splitDataFrameList(my_df, 'Q33', ',')

You end up with more rows than you started with and dupes in the other columns…
But you can then stick the df into a histogram… and I am still verifying… but it looks like it works…