How to extract the nearest last month to date data if the same day of the last month did not have the sale? Please refer to the sample provide for more understanding.
Original data:
It may not have the sale in the yesterday (last month), require to find the nearest day compare to today (last month).
Currently, using the pd.merge
to get the Last MTD
data, but if the same day of last month did not have the product
’s sale, it will show zero
.
Example 1:
02/10/2022 VS 02/09/2022
02/10/2022
have Clothes
’s sale, but 02/09/2022
did not have. Expect the Last MTD
column able to display the MTD
data from last month.
Current result:
Expected output:
Code:
df["pdate"] = df.Date.apply(lambda x: (x - pd.DateOffset(months=1)))
df2 = df.copy()
final_df = pd.merge(left = df,right = df2, how="left", left_on=['pdate','Product'], right_on=['Date', 'Product'])
######## For understanding (can ignore)
###############
Example 2:
03/10/2022 VS 03/09/2022
03/10/2022
have Dining room
’s sale, but 03/09/2022
did not have. Expect the Last MTD
column able to display the MTD
data from last month.
Current result:
Expected result:
@beginof,
Does it have to be an exact month ahead and earlier than the current Date column, or should it be the last sale of the month for that category?
@jinnyzor ,
Can be say is the last sale of the month
@beginof,
Give this a try, as long as it is the last sale of the month per product.
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = pd.DatetimeIndex(df['Date']).month
df['Year'] = pd.DatetimeIndex(df['Date']).year
lastDF = df.groupby(['Product','Month','Year']).last().reset_index()
lastDF.sort_values(['Date'], 0, inplace=True)
for i, row in df.iterrows():
try:
yearCheck = row['Year']
if row['Month'] == 1:
yearCheck = row['Year']-1
df.at[i, 'Last MTD'] = lastDF[(lastDF['Product'] == row['Product']) \
& (lastDF['Month'] == row['Month']-1) & (lastDF['Year'] == yearCheck)]['MTD']
except:
df.at[i,'MTD'] = 0
print(df)
All the MTD
column is showing zero.
@beginof,
Your examples were a little confusing as to what columns were in your unadjusted data.
Is it just Date, Product, Sales?
@jinnyzor
My dataframe will include others column but those column i am not using.
And from this 3 columns i will use python to create the year, month, day, date, MTD, Last MTD and also format the amount to be comma with thousand and 2 decimal.
@beginof,
Try this, it should return info the the MTD column, not sure how you are wanting to fill the Last MTD.
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = pd.DatetimeIndex(df['Date']).month
df['Year'] = pd.DatetimeIndex(df['Date']).year
lastDF = df.groupby(['Product','Month','Year']).last().reset_index()
lastDF.sort_values(['Date'], 0, inplace=True)
for i, row in df.iterrows():
try:
yearCheck = row['Year']
if row['Month'] == 1:
yearCheck = row['Year']-1
df.at[i, 'MTD'] = lastDF[(lastDF['Product'] == row['Product']) \
& (lastDF['Month'] == row['Month']-1) & (lastDF['Year'] == yearCheck)]['Sales']
except:
df.at[i,'MTD'] = 0
print(df)