我有以下熊猫数据框:
d= {'Time': [0,1,2,0,1,2,2,3,4], 'Price': ['Auction', 'Auction','800','900','By Negotiation','700','250','250','Make Offer'],'Item': ['Picasso', 'Picasso', 'Picasso', 'DaVinci', 'DaVinci', 'DaVinci', 'Dali', 'Dali', 'Dali']}
df = pd.DataFrame(data=d)
我想创建第四列“列表历史记录”,其中指定以下内容:
我想按Item分组,然后应用上面的逻辑。
使用类似以下内容来查找列表是否是“首次看到”非常简单:
df['Price_coerced_to_numeric'] = pd.to_numeric(df['Price'], errors='coerce')
df['Price_diff'] = df.groupby(['Item'])['Price_coerced_to_numeric'].diff(1)
我怀疑有一种使用pandas apply和transform的方法,但我还没能解决。非常感谢任何提示。
您可以使用
groupby.shift
和 numpy.select
:
price = df['Price'].mask(pd.to_numeric(df['Price'], errors='coerce').notna(), 'Price')
prev_price = price.groupby(df['Item']).shift()
m1 = ~df['Item'].duplicated()
m2 = price.ne(prev_price)
df['Listing-history'] = np.select([m1, m2], ['first seen', prev_price+'->'+price],
'ongoing listing')
输出:
Time Price Item Listing-history
0 0 Auction Picasso first seen
1 1 Auction Picasso ongoing listing
2 2 800 Picasso Auction->Price
3 0 900 DaVinci first seen
4 1 By Negotiation DaVinci Price->By Negotiation
5 2 700 DaVinci By Negotiation->Price
6 2 250 Dali first seen
7 3 250 Dali ongoing listing
8 4 Make Offer Dali Price->Make Offer