fill_ Between 图在 pandas 时间序列的特定索引值组合上失败

问题描述 投票:0回答:1

我尝试绘制一个图并观察到一个奇怪的错误:

import pandas as pd
import matplotlib.pyplot as plt

idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00', '0 days 18:00:00'],
                        dtype='timedelta64[ns]', freq='6H')
ts1 = pd.Series(np.array([  0., 5439.802205, 4506.0691, 640.734375]), index=idx)
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)

plt.figure()

plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 <= ts2))

plt.show()

这会导致

DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)

使用其他一些索引有效:

# ... as above
ts1 = pd.Series(np.array([  0., 5439.802205, 4506.0691, 640.734375]))
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]))
# as above ...

到目前为止,都是正常的麻烦。但是,如果我替换这些值

# ... as above
ts1 = pd.Series(range(4), index=idx)
ts2 = pd.Series(reversed(range(4)), index=idx)
# as above ...

它也有效!因此,原始索引原始值都不是问题的单一原因。两者的结合就是。

有人可以向我解释一下吗?

pandas numpy matplotlib plot time-series
1个回答
0
投票

TL;博士

使用

pd.TimedeltaIndex.seconds

 将索引转换为秒以获得兼容的数据类型(integers
floats
),并使用 
plt.xticks
 更正 
ticks
(秒)和 
labels
(时间增量字符串):

... # as above, in Q idx_s = ts1.index.seconds # or use .astype(np.int64) if you want `ns`, but you don't need it here plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2)) plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2)) plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right') plt.show()
输出:

fill between


问题在于

plt.fill_between

numpy
 的依赖性。这是一个正确的完整回溯:

Traceback (most recent call last): Cell In[33], line 11 plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2)) File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:3315 in fill_between return gca().fill_between( File ~\anaconda3\lib\site-packages\matplotlib\__init__.py:1473 in inner return func( File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5648 in fill_between return self._fill_between_x_or_y( File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5632 in _fill_between_x_or_y pts = np.vstack([np.hstack([ind[where, None], dep1[where, None]]), File ~\anaconda3\lib\site-packages\numpy\core\shape_base.py:359 in hstack return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting) DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)
这里的最终问题是

matplotlib

使用
numpy
到达
pts
,即需要填充的区域。请参阅
此代码。例如,作为该尝试的一部分,它尝试做这样的事情:

np.hstack([idx.values[:, None], ts1.values[:, None]])
这会失败(对于我们的 

DTypePromotionError

),因为 
idx.values
 具有 dtype 
dtype('<m8[ns]')
,而 
ts1.values
 具有 dtype 
dtype('float64')
。它们是不兼容的,这意味着 
numpy
 无法将这两种 dtype 提升为 
common dtype:timedelta64[ns]
 中不能有小数值。

正如错误消息中提到的,上面的

可以dtype=object

一起使用,但结果会产生各种性能和优化问题,例如矢量化损失。

您提到这与:

ts1 = pd.Series(range(4), index=idx) ts2 = pd.Series(reversed(range(4)), index=idx)
现在,这是有道理的,因为两个系列的 dtype 都是 

dtype('int64')

,其中 
isdtype('<m8[ns]')
 兼容:

np.hstack([idx.values[:, None], ts1.values[:, None]]) array([[ 0, 0], [21600000000000, 1], [43200000000000, 2], [64800000000000, 3]], dtype='timedelta64[ns]')
请注意,该操作已将整数“转换”为“timedelta64[ns]”。它将它们视为纳秒。

所以,这里的解决方案不是使用'timedelta64[ns]',而是将idx

的值转换为(纳)秒,以获得

integers

(与
floats
兼容)。 (使用 
pd.TimedeltaIndex.seconds
 会得到 
dtype='int32'
,这足以满足您的示例。如果您确实想要纳秒,请使用 
idx.astype(np.int64)
。)
然后使用 
plt.xticks

根据需要正确设置

ticks

labels
。完整代码:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00', 
                         '0 days 18:00:00'], dtype='timedelta64[ns]', freq='6h')
# 'H' is deprecated, use 'h' instead.

ts1 = pd.Series(np.array([  0., 5439.802205, 4506.0691, 640.734375]), index=idx)
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)

plt.figure()

idx_s = ts1.index.seconds
# or use .astype(np.int64) if you want `ns`, but you don't need it here

plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2))

plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right')
plt.show()


© www.soinside.com 2019 - 2024. All rights reserved.