假设我有以下数据:
Date,release,count
2019-03-01,buster,0
2019-03-01,jessie,1
2019-03-01,stretch,74
2019-08-15,buster,25
2019-08-15,jessie,1
2019-08-15,stretch,49
2019-10-07,buster,35
2019-10-07,jessie,1
2019-10-07,stretch,43
2019-10-08,buster,40
2019-10-08,jessie,1
2019-10-08,stretch,38
2019-10-09,buster,46
2019-10-09,jessie,1
2019-10-09,stretch,33
2019-10-23,buster,46
2019-10-23,jessie,1
2019-10-23,stretch,31
2019-11-25,buster,46
2019-11-25,jessie,1
2019-11-25,stretch,29
2020-01-13,buster,48
2020-01-13,jessie,1
2020-01-13,stretch,28
2020-01-29,buster,50
2020-01-29,jessie,1
2020-01-29,stretch,26
2020-03-10,buster,54
2020-03-10,jessie,1
2020-03-10,stretch,22
2020-04-14,buster,55
2020-04-14,jessie,0
2020-04-14,stretch,21
2020-05-11,buster,57
2020-05-11,jessie,0
2020-05-11,stretch,17
2020-05-25,buster,61
2020-05-25,jessie,0
2020-05-25,stretch,14
2020-06-10,buster,62
2020-06-10,stretch,12
2020-07-01,buster,69
2020-07-01,stretch,3
2020-10-30,buster,74
2020-10-30,stretch,2
2020-11-18,buster,76
2020-11-18,stretch,2
2021-08-26,bullseye,1
2021-08-26,buster,86
2021-08-26,stretch,1
2021-10-08,bullseye,4
2021-10-08,buster,86
2021-10-08,stretch,1
2021-11-11,bullseye,4
2021-11-11,buster,84
2021-11-11,stretch,1
2021-11-17,bullseye,4
2021-11-17,buster,85
2021-11-17,stretch,0
以及以下代码:
import pandas as pd
import matplotlib.pyplot as plt
# Load the data
df = pd.read_csv('subset.csv')
# Pivot the data to a suitable format for plotting
df = df.pivot_table(index="Date", columns='release', values='count', aggfunc='sum')
# Convert the index to datetime and sort it
df.index = pd.to_datetime(df.index)
print(df)
# Plotting the data with filled areas
fig, ax = plt.subplots(figsize=(12, 6))
df.plot(ax=ax, kind="area", stacked=True)
plt.show()
它生成以下图表:
在上图中,
jessie
线应该在 2020-05-25
之后停止,位于图表的中间。但它只是继续前进,一条线的小能量兔子,一直到图表的右侧,尽管它实际上是NaN
。在 print(df)
输出中,我们可以看到这是枢轴之后的基础数据帧:
release bullseye buster jessie stretch
Date
2019-03-01 NaN 0.0 1.0 74.0
2019-08-15 NaN 25.0 1.0 49.0
2019-10-07 NaN 35.0 1.0 43.0
2019-10-08 NaN 40.0 1.0 38.0
2019-10-09 NaN 46.0 1.0 33.0
2019-10-23 NaN 46.0 1.0 31.0
2019-11-25 NaN 46.0 1.0 29.0
2020-01-13 NaN 48.0 1.0 28.0
2020-01-29 NaN 50.0 1.0 26.0
2020-03-10 NaN 54.0 1.0 22.0
2020-04-14 NaN 55.0 0.0 21.0
2020-05-11 NaN 57.0 0.0 17.0
2020-05-25 NaN 61.0 0.0 14.0
2020-06-10 NaN 62.0 NaN 12.0
2020-07-01 NaN 69.0 NaN 3.0
2020-10-30 NaN 74.0 NaN 2.0
2020-11-18 NaN 76.0 NaN 2.0
2021-08-26 1.0 86.0 NaN 1.0
2021-10-08 4.0 86.0 NaN 1.0
2021-11-11 4.0 84.0 NaN 1.0
2021-11-17 4.0 85.0 NaN 0.0
有趣的是,如果仔细观察,您还可以看到“牛眼”(蓝色)线实际上从图表开始就存在。
所以,这是怎么回事?是 matplotlib 或 pandas 或 something 在那里将 NaN 绘制为“零”而不是“不在这个图中?
并且
dropna
不是这里的答案:它会删除整个行或列,我需要删除cell,这在这里没有意义。
请注意,我之前使用条形图对该图进行的迭代没有这个问题:
只需将上面的
area
替换为 bar
即可重现。条形图的问题是它不考虑 X 轴(时间)的比例。