Extract the nearest last month to date data if the same day of the last month did not have

How to extract the nearest last month to date data if the same day of the last month did not have the sale? Please refer to the sample provide for more understanding.

Original data:

enter image description here

It may not have the sale in the yesterday (last month), require to find the nearest day compare to today (last month).

Currently, using the pd.merge to get the Last MTD data, but if the same day of last month did not have the product’s sale, it will show zero.

Example 1:

02/10/2022 VS 02/09/2022
02/10/2022 have Clothes’s sale, but 02/09/2022 did not have. Expect the Last MTD column able to display the MTD data from last month.

Current result:

enter image description here

Expected output:

enter image description here

Code:

df["pdate"] = df.Date.apply(lambda x: (x - pd.DateOffset(months=1)))
df2 = df.copy()
final_df = pd.merge(left = df,right = df2, how="left", left_on=['pdate','Product'], right_on=['Date', 'Product'])





######## For understanding (can ignore)
###############

Example 2:

03/10/2022 VS 03/09/2022
03/10/2022 have Dining room’s sale, but 03/09/2022 did not have. Expect the Last MTD column able to display the MTD data from last month.

Current result:

enter image description here


Expected result:

enter image description here

@beginof,

Does it have to be an exact month ahead and earlier than the current Date column, or should it be the last sale of the month for that category?

@jinnyzor ,

Can be say is the last sale of the month

@beginof,

Give this a try, as long as it is the last sale of the month per product. :slight_smile:


df['Date'] = pd.to_datetime(df['Date'])

df['Month'] = pd.DatetimeIndex(df['Date']).month

df['Year'] = pd.DatetimeIndex(df['Date']).year

lastDF = df.groupby(['Product','Month','Year']).last().reset_index()

lastDF.sort_values(['Date'], 0, inplace=True)

for i, row in df.iterrows():
    try:
        yearCheck = row['Year']
        if row['Month'] == 1:
            yearCheck = row['Year']-1

        df.at[i, 'Last MTD'] = lastDF[(lastDF['Product'] == row['Product']) \
                             & (lastDF['Month'] == row['Month']-1) & (lastDF['Year'] == yearCheck)]['MTD']
    except:
        df.at[i,'MTD'] = 0



print(df)

All the MTD column is showing zero.

@beginof,

Your examples were a little confusing as to what columns were in your unadjusted data.
Is it just Date, Product, Sales?

@jinnyzor

My dataframe will include others column but those column i am not using.
And from this 3 columns i will use python to create the year, month, day, date, MTD, Last MTD and also format the amount to be comma with thousand and 2 decimal.

@beginof,

Try this, it should return info the the MTD column, not sure how you are wanting to fill the Last MTD.

df['Date'] = pd.to_datetime(df['Date'])

df['Month'] = pd.DatetimeIndex(df['Date']).month

df['Year'] = pd.DatetimeIndex(df['Date']).year

lastDF = df.groupby(['Product','Month','Year']).last().reset_index()

lastDF.sort_values(['Date'], 0, inplace=True)

for i, row in df.iterrows():
    try:
        yearCheck = row['Year']
        if row['Month'] == 1:
            yearCheck = row['Year']-1

        df.at[i, 'MTD'] = lastDF[(lastDF['Product'] == row['Product']) \
                             & (lastDF['Month'] == row['Month']-1) & (lastDF['Year'] == yearCheck)]['Sales']
    except:
        df.at[i,'MTD'] = 0



print(df)