pandas column-slices with mypy

Question

最近我发现自己处于一个无法自己解决的奇怪情况：

考虑这个 MWE：

import pandas
import numpy as np

data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))

observations = data.loc[:, :"c"]
features = data.loc[:, "c":]

print(data)
print(observations)
print(features)

根据this Answer切片本身是正确完成的，并且在打印正确结果的意义上它是有效的。但是当我尝试在它上面运行 mypy 时，我得到了这个错误：

mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)

这也是正确的，因为切片不是用整数完成的。我怎样才能满足或禁用

Slice index must be an integer or None

错误？

当然你可以使用

iloc(:,:3)

来解决这个问题，但这感觉像是一种不好的做法，因为使用

iloc

我们依赖于列的顺序（在这个例子中

loc

也依赖于顺序，但这只是为了保持 MWE 短）。

Answer 1

这是一个悬而未决的问题（#GH2410）。

作为解决方法，您可以尝试使用

get_loc

：

col_idx = data.columns.get_loc("c")

observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]

输出：

           a         b         c # <- observations
0   0.269605  0.497063  0.676928
1   0.526765  0.204216  0.748203
2   0.919330  0.059722  0.422413
..       ...       ...       ...
7   0.056050  0.521702  0.727323
8   0.635477  0.145401  0.258166
9   0.041886  0.812769  0.839979

[10 rows x 3 columns]

           c         d         e  # <- features
0   0.676928  0.672298  0.177933
1   0.748203  0.995165  0.136659
2   0.422413  0.222377  0.395179
..       ...       ...       ...
7   0.727323  0.291441  0.056998
8   0.258166  0.219025  0.405838
9   0.839979  0.923173  0.431298

[10 rows x 3 columns]

pandas column-slices with mypy

问题描述投票：0回答：1

1个回答

最新问题

pandas column-slices with mypy

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1