在链式操作中使用 _ 传输的中间结果是否可用于链中的后续函数？

Question

我正在创建一个相关矩阵，我想从中获得最大正相关值。将 max() 应用于

corr()

结果只会返回 1.0 沿轴的相关性，这是不需要的，因此目标是删除所有出现的 1.0，然后运行

max()

。我正在考虑在链式操作中执行此操作，并且可以使用 _ 将中间结果通过管道传输到

where()

操作，这确实会将 1.0 转换为 NaN。然而，应用

max()

作为链中的下一个操作仍然返回 1.0，就好像它忽略了

where()

的结果一样。

我对 _ 运算符有什么不理解的地方吗？或者在这种情况下

where()

可能是错误的函数？我在下面提供了完整的代码来重现问题。

# Set up the problem

import pandas as pd
import numpy as np

# raw data

raw_t = [
66.6, 36.4, 47.6, 17.0, 54.6, 21.0, 12.2, 13.6, 20.6, 55.4, 63.4, 69.0,
80.2, 26.2, 42.6, 31.8, 15.6, 27.8, 13.8, 22.0, 14.2, 62.6, 96.4, 113.8,
115.2,82.2, 65.0, 23.2, 24.0, 14.2,  1.4,  3.8, 16.4, 16.4, 67.0, 51.4
]

# raw indexes

yr_mn = (np.full(12, 2000).tolist() + np.full(12, 2001).tolist() + np.full(12, 2002).tolist(),
np.arange(1,13).tolist() + np.arange(1,13).tolist() + np.arange(1,13).tolist() )

# structure multi index

index_base = list(zip(*yr_mn))
index = pd.MultiIndex.from_tuples(index_base, names=["year", "month"])

# create indexed dataset

t_dat = pd.Series(raw_t, index=index)

# example of the correlation matrix we are working with

pd.set_option("format.precision", 2)
t_dat.unstack().corr().style.background_gradient(cmap="YlGnBu")

我的尝试：


t_dat.unstack().corr().stack().where(_!=1.0) # does swap out 1.0 for NaN  
t_dat.unstack().corr().stack().where(_!=1.0).max() # still returns 1.0

还有一点就是有时候能用，有时候不行，返回

ValueError: Array conditional must be same shape as self

这也让我怀疑我错过了什么。 panda 的

max()

的默认设置是跳过 NaN，所以它应该与此无关。我还尝试使用

where(_!=1.0,0.0)

将 1.0 设置为 0.0；相同的结果。另外，我发现如果我删除位置并重新运行，则可以克服 ValueError，如下所示：


t_dat.unstack().corr().stack()#.where(\_!=1.0)

这会以某种方式重置它，即使原始数据帧没有被改变。

感谢您的任何见解！大卫

Answer 1

不要在交互式环境中使用

- 这包含最后一个命令的结果（它可以工作，但最终会崩溃）。

你可以这样做：

# store the result to a variable:
result = t_dat.unstack().corr().stack()

# compute the boolean mask and set the True values to NaN
mask = result == 1.0
result[mask] = np.nan

print(result)

打印：


...
11     1       -0.148800
       2       -0.561202
       3       -0.595797
       4        0.945831
       5       -0.737437
       6        0.812018
       7        0.516614
       8        0.785324
       9       -0.823919
       10       0.539078
       11            NaN
       12       0.929903
12     1       -0.502081
       2       -0.826288
       3       -0.849431
       4        0.760119
       5       -0.437322
       6        0.969761
       7        0.795323
       8        0.957978
       9       -0.557725
       10       0.811077
       11       0.929903
       12            NaN
dtype: float64

然后你可以计算

max

：

print(result.max())

打印：

0.9996502197746994

在链式操作中使用 _ 传输的中间结果是否可用于链中的后续函数？

问题描述投票：0回答：1

1个回答

最新问题

在链式操作中使用 _ 传输的中间结果是否可用于链中的后续函数？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1