我正在使用 Pandas 1.5.1 并遇到一些带有条件设置项的奇怪类型行为:
>>> b = pd.Series([1.2, 2.2, 3.2], dtype="float64")
>>> a = pd.Series([1,2,3], dtype="int32")
>>> a[a < 2] = b
>>> a
0 1.2
1 2.0
2 3.0
dtype: float64
当两个系列都使用 numpy 类型时,赋值成功,但是,如果我将 numpy 类型更改为 Pandas 扩展类型:pd.Float64Dtype() 和 pd.Int32Dtype(),并使用相同的数据,则赋值失败:
>>> b = pd.Series([1.2, 2.2, 3.2], dtype="Float64")
>>> a = pd.Series([1,2,3], dtype="Int32")
>>> a[a < 2] = b
TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe' The above exception was the direct cause of the following exception: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\series.py", line 1162, in __setitem__
self._where(~key, value, inplace=True)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\generic.py", line 9733, in _where
new_data = self._mgr.putmask(mask=cond, new=other, align=align)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\managers.py", line 407, in putmask
return self.apply(
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\managers.py", line 347, in apply
applied = getattr(b, f)(**kwargs)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\internals\blocks.py", line 1517, in putmask
values._putmask(mask, new)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\base.py", line 1523, in _putmask
self[mask] = val
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\masked.py", line 237, in __setitem__
value, mask = self._coerce_to_array(value, dtype=self.dtype)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\numeric.py", line 258, in _coerce_to_array
values, mask, _, _ = _coerce_to_data_and_mask(
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\numeric.py", line 214, in _coerce_to_data_and_mask
values = dtype_cls._safe_cast(values, dtype, copy=False)
File "C:\ca2_ps_env_3810\lib\site-packages\pandas\core\arrays\integer.py", line 57, in _safe_cast
raise TypeError(
TypeError: cannot safely cast non-equivalent object to int32
此代码是否会出现此错误?我不清楚为什么它在 Pandas 扩展类型时将 Float64Dtype() 视为对象数据类型,以及为什么使用扩展类型的相同数据会导致分配失败。如果我使用 concat 而不是 setitem,它会成功。