见下图:
我想将红色虚线上方的每个正主干与红线下方最近的负主干相匹配。这场比赛是根据茎彼此分开的时间来进行的。因此,词干 A 将与词干 B 匹配。负词干只能匹配一次,因此 B 也不能与 C 匹配。由于茎 D 距离 A en C 太远(假设时间增量 >= X),因此不予考虑。
BOX_187_084_11
2005-12-01 -190.379230 D
2008-03-01 -261.853410 B
2008-09-01 268.353538 A
2011-09-01 258.084186 C
这是对应的Dataframe。如何以 pandaic 方式轻松解决这个问题?
这个问题不太适合矢量化,而矢量化正是 pandas 最擅长的。您仍然可以使用
for
循环来解决它。
我假设日期是您的索引,并且类型为日期时间。如果没有,请在此代码片段之前使用
df.index = pd.to_datetime(df.index)
将其转换为日期时间:
# The two red lines
lowerbound, upperbound = -125, 125
# The time limit for a match
time_limit = pd.Timedelta(days=365)
# The signal is considered positive or negative only if it exceeds the red lines
s = df["BOX_187_084_11"]
is_positive = s > upperbound
is_negative = s < lowerbound
# What a positive signal is matched to
df["MatchedTo"] = None
# Whether the negative signal has been matched
df["Matched"] = False
# Loop through each positive signal a find a match
for index, value in s[is_positive].items():
# A matching signal must be negative, never matched before, and within the
# time limit
cond = is_negative & ~df["Matched"] & (df.index > index - time_limit)
if ~cond.any():
continue
# Store the matched data
matched_index = cond.index.max()
df.loc[index, "MatchedTo"] = matched_index
df.loc[matched_index, "Matched"] = True