更新 pandas 中满足特定条件的行值

Question

假设我有以下数据框：

更新列 feat 和 another_feat 的值最有效的方法是什么，其中 stream 是数字 2？

是这个吗？

for index, row in df.iterrows():
    if df1.loc[index,'stream'] == 2:
       # do something

超过100列怎么办？我不想明确命名我想要更新的列。我想将每列的值除以 2（除了流列）。

所以要明确的是，我的目标是：

将具有流 2 的所有行的所有值除以 2，但不更改流列。

Answer 1

如果您需要将两列更新为相同的值，我认为您可以使用

loc

：

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2        aaaa         aaaa
c       2        aaaa         aaaa
d       3  some_value   some_value

如果您需要单独更新，一种选择是使用：

df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2          10   some_value
c       2          10   some_value
d       3  some_value   some_value

另一个常见的选项是使用

numpy.where

:

df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
   stream  feat another_feat
a       1    20   some_value
b       2    10   some_value
c       2    10   some_value
d       3    20   some_value

编辑：如果您需要在条件为

stream

的情况下划分所有没有

True

的列，请使用：

print df1
   stream  feat  another_feat
a       1     4             5
b       2     4             5
c       2     2             9
d       3     1             7

#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']

df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
   stream  feat  another_feat
a       1   4.0           5.0
b       2   2.0           2.5
c       2   1.0           4.5
d       3   1.0           7.0

如果可以使用多个条件，请使用多个

numpy.where

或

numpy.select

:

df0 = pd.DataFrame({'Col':[5,0,-6]})

df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing', 
                          np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))

df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
                            ['Increasing',  'Decreasing'], 
                            default='No Change')

print (df0)
   Col    New Col1    New Col2
0    5  Increasing  Increasing
1    0   No Change   No Change
2   -6  Decreasing  Decreasing

Answer 2

您可以对

.ix

执行相同操作，如下所示：

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))

In [2]: df
Out[2]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484 -0.905302 -0.435821  1.934512
3  0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

In [3]: df.ix[df.a>0, ['b','c']] = 0

In [4]: df
Out[4]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484  0.000000  0.000000  1.934512
3  0.266113  0.000000  0.000000 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

编辑

在额外信息之后，以下将返回所有列 - 满足某些条件 - 值减半：

>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)

Answer 3

另一种矢量化解决方案是使用

mask()

方法将

stream=2

和

join()

这些列对应的行减半为仅包含

stream

列的数据帧：

cols = ['feat', 'another_feat']
df[['stream']].join(df[cols].mask(df['stream'] == 2, lambda x: x/2))

或者您也可以

update()

原始数据框：

df.update(df[cols].mask(df['stream'] == 2, lambda x: x/2))

以上代码均执行以下操作：

如果要替换的值是常量（不是使用函数导出的），则使用

mask()

会更简单；例如以下代码将与等于 1 或 3 的

feat

对应的所有

stream

值替换为 100。¹

df[['stream']].join(df.filter(like='feat').mask(df['stream'].isin([1,3]), 100))

^{1：也可以使用}

feat

 方法选择

filter() 列。

更新 pandas 中满足特定条件的行值

问题描述投票：0回答：3

3个回答

最新问题

更新 pandas 中满足特定条件的行值

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3