我输入数据如下
Date Investment Type Medium 1/1/2000 Mutual Fund, Stocks, Fixed Deposit, Real Estate Own, Online,Through Agent 1/2/2000 Mutual Fund, Stocks, Real Estate Own 1/3/2000 Fixed Deposit Online 1/3/2000 Mutual Fund, Fixed Deposit, Real Estate Through Agent 1/2/2000 Stocks Own, Online, Through Agent
我的功能输入是中等。它可以是列表的单个值。我想基于Medium输入搜索数据,然后聚合下面给出的数据。对于Medium中的值,请检查投资类型,然后汇总每种投资类型的数据
Medium Investment Type Date Own,Online Mutual Fund 1/1/2000,1/2/2000 Own,Online Stocks 1/1/2000,1/2/2000 Own,Online Fixed Deposit 1/1/2000,1/3/2000 Own,Online Real Estate 1/1/2000
您可以使用:
L = ['Online','Own']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['New_Medium'] = df.pop('Medium').str.findall('('+ pat + ')').str.join(', ')
#remove rows with empty values
df = df[df['New_Medium'].astype(bool)]
from itertools import product
df1 = pd.DataFrame([j for i in df.apply(lambda x: x.str.split(',\s*')).values
for j in product(*i)], columns=df.columns)
print (df1)
Date Investment Type New_Medium
0 1/1/2000 Mutual Fund Own
1 1/1/2000 Mutual Fund Online
2 1/1/2000 Stocks Own
3 1/1/2000 Stocks Online
4 1/1/2000 Fixed Deposit Own
5 1/1/2000 Fixed Deposit Online
6 1/1/2000 Real Estate Own
7 1/1/2000 Real Estate Online
8 1/2/2000 Mutual Fund Own
9 1/2/2000 Stocks Own
10 1/2/2000 Real Estate Own
11 1/3/2000 Fixed Deposit Online
12 1/2/2000 Stocks Own
13 1/2/2000 Stocks Online
#get all combinations and aggregate join by unique values
df = df1.groupby('Investment Type').agg(lambda x: ', '.join(x.unique())).reset_index()
print (df)
Investment Type Date New_Medium
0 Fixed Deposit 1/1/2000, 1/3/2000 Own, Online
1 Mutual Fund 1/1/2000, 1/2/2000 Own, Online
2 Real Estate 1/1/2000, 1/2/2000 Own, Online
3 Stocks 1/1/2000, 1/2/2000 Own, Online