Pandas 数据框使用特定列的 interpolate() 分区

问题描述 投票:0回答:1

我有以下 Pandas 数据框(称为

df
)。

+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John   | Dogs   | 2000 | 2      |
| John   | Dogs   | 2001 | 2      |
| John   | Dogs   | 2002 | 2      |
| John   | Dogs   | 2003 | 2      |
| John   | Dogs   | 2004 | 2      |
| John   | Dogs   | 2005 | 2      |
| John   | Cats   | 2000 | 1      |
| John   | Cats   | 2001 | NaN    |
| John   | Cats   | 2002 | NaN    |
| John   | Cats   | 2003 | 4      |
| John   | Cats   | 2004 | 5      |
| John   | Cats   | 2005 | 6      |
| Peter  | Dogs   | 2000 | NaN    |
| Peter  | Dogs   | 2001 | 1      |
| Peter  | Dogs   | 2002 | NaN    |
| Peter  | Dogs   | 2003 | 5      |
| Peter  | Dogs   | 2004 | 5      |
| Peter  | Dogs   | 2005 | 5      |
| Peter  | Cats   | 2000 | NaN    |
| Peter  | Cats   | 2001 | 4      |
| Peter  | Cats   | 2002 | 4      |
| Peter  | Cats   | 2003 | 4      |
| Peter  | Cats   | 2004 | 4      |
| Peter  | Cats   | 2005 | 4      |
+--------+--------+------+--------+

我的目标是得到以下内容,这意味着使用插值方法来填充

NaN
值,但基于其他列值。换句话说,应该

  1. 使用
    Person
    Animal
    列对 df 进行分区
  2. Year
    (升序)
  3. 排序
  4. 应用插值方法

.

+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John   | Dogs   | 2000 | 2      |
| John   | Dogs   | 2001 | 2      |
| John   | Dogs   | 2002 | 2      |
| John   | Dogs   | 2003 | 2      |
| John   | Dogs   | 2004 | 2      |
| John   | Dogs   | 2005 | 2      |
| John   | Cats   | 2000 | 1      |
| John   | Cats   | 2001 | 2      |
| John   | Cats   | 2002 | 3      |
| John   | Cats   | 2003 | 4      |
| John   | Cats   | 2004 | 5      |
| John   | Cats   | 2005 | 6      |
| Peter  | Dogs   | 2000 | NaN    |
| Peter  | Dogs   | 2001 | 1      |
| Peter  | Dogs   | 2002 | 3      |
| Peter  | Dogs   | 2003 | 5      |
| Peter  | Dogs   | 2004 | 5      |
| Peter  | Dogs   | 2005 | 5      |
| Peter  | Cats   | 2000 | NaN    |
| Peter  | Cats   | 2001 | 4      |
| Peter  | Cats   | 2002 | 4      |
| Peter  | Cats   | 2003 | 4      |
| Peter  | Cats   | 2004 | 4      |
| Peter  | Cats   | 2005 | 4      |
+--------+--------+------+--------+

我做了什么

我可以过滤每个人和每个动物,然后应用插值方法。最后,将所有内容合并在一起,但如果您有很多列,这听起来又乏味又漫长。

python pandas dataframe interpolation fillna
1个回答
0
投票

你可以尝试:

df['Number'] = (df.sort_values('Year', ascending=True)
                  .groupby(['Person', 'Animal'])['Number']
                  .transform(lambda x: x.interpolate()))
print(df)

# Output
   Person Animal  Year  Number
0    John   Dogs  2000     2.0
1    John   Dogs  2001     2.0
2    John   Dogs  2002     2.0
3    John   Dogs  2003     2.0
4    John   Dogs  2004     2.0
5    John   Dogs  2005     2.0
6    John   Cats  2000     1.0
7    John   Cats  2001     2.0  # interpolate
8    John   Cats  2002     3.0  # interpolate
9    John   Cats  2003     4.0
10   John   Cats  2004     5.0
11   John   Cats  2005     6.0
12  Peter   Dogs  2000     NaN
13  Peter   Dogs  2001     1.0
14  Peter   Dogs  2002     3.0
15  Peter   Dogs  2003     5.0
16  Peter   Dogs  2004     5.0
17  Peter   Dogs  2005     5.0
18  Peter   Cats  2000     NaN
19  Peter   Cats  2001     4.0
20  Peter   Cats  2002     4.0
21  Peter   Cats  2003     4.0
22  Peter   Cats  2004     4.0
23  Peter   Cats  2005     4.0

对于多列,只需使用相同的操作:

cols = ['Number']  # list of columns
df[cols] = (df.sort_values('Year', ascending=True)
              .groupby(['Person', 'Animal'])[cols]
              .transform(lambda x: x.interpolate()))
© www.soinside.com 2019 - 2024. All rights reserved.