根据先前索引中的存在创建新列

Question

我有以下（分解的）数据框：

我想创建一个新列，指示上一个索引和/或下一个索引中是否存在日期（日期列）。这个新列的值为：

0：不存在于上一个和下一个索引中
1：出现在上一个或下一个索引中
2：出现在上一个和下一个索引中

例如索引 303：

日期 1991：新列值 = 0（因为 1991 年不存在于索引 302 和 304 中）

1996 年日期：新列值 = 1（1996 年出现在索引 302 中，但不在索引 304 中）

2010 年日期：新列值 = 2（2010 年出现在索引 302 和 304 中）

有人知道如何以 pandaic 方式轻松解决这个问题吗？

Answer 1

由于没有给出代码，我以简化的方式重现了您的数据框。

import pandas as pd

test = pd.DataFrame(
    {"date": ["a", "b", "c", "a", "a", "a", "b", "b", "a"]}, index=[1,1,1,2,2,2,3,3,3])
test.index.name = "index"

lists_df = test.groupby("index")["date"].agg(set)

res = (
    ((1- (test.date.agg(set) - lists_df.shift(1)[test.index]).fillna(True).astype(bool))
    +(1- (test.date.agg(set) - lists_df.shift(-1)[test.index]).fillna(True).astype(bool)))
)
print("original df\n", test)
print("resulting calculation\n", res)

打印出以下结果：

original df
       date
index     
1        a
1        b
1        c
2        a
2        a
2        a
3        b
3        b
3        a
resulting calculation
 index
1    1
1    0
1    0
2    2
2    2
2    2
3    0
3    0
3    1
Name: date, dtype: int64

根据先前索引中的存在创建新列

问题描述投票：0回答：1

1个回答

最新问题

根据先前索引中的存在创建新列

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1