我有一个数据框,其中包含日期索引和正值和负值池
values = [1,2,3,4,-1,-2,-3,10,11,12]
start_date = pd.to_datetime('2019-01-23')
dates = [start_date + datetime.timedelta(days=i) for i in range(0,len(values))]
df = pd.DataFrame(values)
df.columns = ['values'] ; df.index = dates ; df
我想要一个额外的列,其中第一组正值的最大值和第二组负值的最小值,依此类推。
输出应该是这样的
df['values_max'] = np.nan
df.loc['2019-01-26','values_max'] = 4
df.loc['2019-01-29','values_max'] = -3
df.loc['2019-02-01','values_max'] = 12
df
我将不胜感激任何帮助。
使用:
#map positive and negative values to 1, -1
s = np.sign(df['values'])
#create consecutive groups
g = s.ne(s.shift()).cumsum()
#create new columns with consition
df['new'] = df.groupby(g)['values'].transform(lambda x: x.max() if x.max() > 0 else x.min())
#add missing values
df.loc[df['new'] != df['values'], 'new'] = np.nan
print (df)
values new
2019-01-23 1 NaN
2019-01-24 2 NaN
2019-01-25 3 NaN
2019-01-26 4 4.0
2019-01-27 -1 NaN
2019-01-28 -2 NaN
2019-01-29 -3 -3.0
2019-01-30 10 NaN
2019-01-31 11 NaN
2019-02-01 12 12.0