我尝试绘制一个图并观察到一个奇怪的错误:
import pandas as pd
import matplotlib.pyplot as plt
idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00', '0 days 18:00:00'],
dtype='timedelta64[ns]', freq='6H')
ts1 = pd.Series(np.array([ 0., 5439.802205, 4506.0691, 640.734375]), index=idx)
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)
plt.figure()
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 <= ts2))
plt.show()
这会导致
DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)
使用其他一些索引有效:
# ... as above
ts1 = pd.Series(np.array([ 0., 5439.802205, 4506.0691, 640.734375]))
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]))
# as above ...
到目前为止,都是正常的麻烦。但是,如果我替换这些值
# ... as above
ts1 = pd.Series(range(4), index=idx)
ts2 = pd.Series(reversed(range(4)), index=idx)
# as above ...
它也有效!因此,原始索引或原始值都不是问题的单一原因。两者的结合就是。
有人可以向我解释一下吗?
TL;博士
使用 将索引转换为秒以获得兼容的数据类型(integers
和
floats
),并使用
plt.xticks
更正 ticks
(秒)和
labels
(时间增量字符串):
... # as above, in Q
idx_s = ts1.index.seconds
# or use .astype(np.int64) if you want `ns`, but you don't need it here
plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2))
plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right')
plt.show()
输出:
numpy
的依赖性。这是一个正确的完整回溯:
Traceback (most recent call last):
Cell In[33], line 11
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2))
File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:3315 in fill_between
return gca().fill_between(
File ~\anaconda3\lib\site-packages\matplotlib\__init__.py:1473 in inner
return func(
File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5648 in fill_between
return self._fill_between_x_or_y(
File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5632 in _fill_between_x_or_y
pts = np.vstack([np.hstack([ind[where, None], dep1[where, None]]),
File ~\anaconda3\lib\site-packages\numpy\core\shape_base.py:359 in hstack
return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)
这里的最终问题是matplotlib
使用
numpy
到达
pts
,即需要填充的区域。请参阅此代码。例如,作为该尝试的一部分,它尝试做这样的事情:
np.hstack([idx.values[:, None], ts1.values[:, None]])
这会失败(对于我们的 DTypePromotionError
),因为
idx.values
具有 dtype
dtype('<m8[ns]')
,而
ts1.values
具有 dtype
dtype('float64')
。它们是不兼容的,这意味着
numpy
无法将这两种 dtype 提升为common dtype:
timedelta64[ns]
中不能有小数值。正如错误消息中提到的,上面的
可以与dtype=object
一起使用,但结果会产生各种性能和优化问题,例如矢量化损失。您提到这与:
ts1 = pd.Series(range(4), index=idx)
ts2 = pd.Series(reversed(range(4)), index=idx)
现在,这是有道理的,因为两个系列的 dtype 都是 dtype('int64')
,其中is 与
dtype('<m8[ns]')
兼容:
np.hstack([idx.values[:, None], ts1.values[:, None]])
array([[ 0, 0],
[21600000000000, 1],
[43200000000000, 2],
[64800000000000, 3]], dtype='timedelta64[ns]')
请注意,该操作已将整数“转换”为“timedelta64[ns]”。它将它们视为纳秒。
所以,这里的解决方案不是使用'timedelta64[ns]',而是将idx
integers
(与
floats
兼容)。 (使用
pd.TimedeltaIndex.seconds
会得到
dtype='int32'
,这足以满足您的示例。如果您确实想要纳秒,请使用
idx.astype(np.int64)
。)
然后使用
plt.xticks
根据需要正确设置 ticks
和
labels
。完整代码:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00',
'0 days 18:00:00'], dtype='timedelta64[ns]', freq='6h')
# 'H' is deprecated, use 'h' instead.
ts1 = pd.Series(np.array([ 0., 5439.802205, 4506.0691, 640.734375]), index=idx)
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)
plt.figure()
idx_s = ts1.index.seconds
# or use .astype(np.int64) if you want `ns`, but you don't need it here
plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2))
plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right')
plt.show()