pandas 无法隐藏堆叠线图中的 NaN 条目

问题描述 投票:0回答:1

假设我有以下数据:

Date,release,count
2019-03-01,buster,0
2019-03-01,jessie,1
2019-03-01,stretch,74
2019-08-15,buster,25
2019-08-15,jessie,1
2019-08-15,stretch,49
2019-10-07,buster,35
2019-10-07,jessie,1
2019-10-07,stretch,43
2019-10-08,buster,40
2019-10-08,jessie,1
2019-10-08,stretch,38
2019-10-09,buster,46
2019-10-09,jessie,1
2019-10-09,stretch,33
2019-10-23,buster,46
2019-10-23,jessie,1
2019-10-23,stretch,31
2019-11-25,buster,46
2019-11-25,jessie,1
2019-11-25,stretch,29
2020-01-13,buster,48
2020-01-13,jessie,1
2020-01-13,stretch,28
2020-01-29,buster,50
2020-01-29,jessie,1
2020-01-29,stretch,26
2020-03-10,buster,54
2020-03-10,jessie,1
2020-03-10,stretch,22
2020-04-14,buster,55
2020-04-14,jessie,0
2020-04-14,stretch,21
2020-05-11,buster,57
2020-05-11,jessie,0
2020-05-11,stretch,17
2020-05-25,buster,61
2020-05-25,jessie,0
2020-05-25,stretch,14
2020-06-10,buster,62
2020-06-10,stretch,12
2020-07-01,buster,69
2020-07-01,stretch,3
2020-10-30,buster,74
2020-10-30,stretch,2
2020-11-18,buster,76
2020-11-18,stretch,2
2021-08-26,bullseye,1
2021-08-26,buster,86
2021-08-26,stretch,1
2021-10-08,bullseye,4
2021-10-08,buster,86
2021-10-08,stretch,1
2021-11-11,bullseye,4
2021-11-11,buster,84
2021-11-11,stretch,1
2021-11-17,bullseye,4
2021-11-17,buster,85
2021-11-17,stretch,0

以及以下代码:

import pandas as pd
import matplotlib.pyplot as plt

# Load the data
df = pd.read_csv('subset.csv')

# Pivot the data to a suitable format for plotting
df = df.pivot_table(index="Date", columns='release', values='count', aggfunc='sum')

# Convert the index to datetime and sort it
df.index = pd.to_datetime(df.index)

print(df)

# Plotting the data with filled areas
fig, ax = plt.subplots(figsize=(12, 6))
df.plot(ax=ax, kind="area", stacked=True)

plt.show()

它生成以下图表:

enter image description here

在上图中,

jessie
线应该在
2020-05-25
之后停止,位于图表的中间。但它只是继续前进,一条线的小能量兔子,一直到图表的右侧,尽管它实际上是
NaN
。在
print(df)
输出中,我们可以看到这是枢轴之后的基础数据帧:

release     bullseye  buster  jessie  stretch
Date                                         
2019-03-01       NaN     0.0     1.0     74.0
2019-08-15       NaN    25.0     1.0     49.0
2019-10-07       NaN    35.0     1.0     43.0
2019-10-08       NaN    40.0     1.0     38.0
2019-10-09       NaN    46.0     1.0     33.0
2019-10-23       NaN    46.0     1.0     31.0
2019-11-25       NaN    46.0     1.0     29.0
2020-01-13       NaN    48.0     1.0     28.0
2020-01-29       NaN    50.0     1.0     26.0
2020-03-10       NaN    54.0     1.0     22.0
2020-04-14       NaN    55.0     0.0     21.0
2020-05-11       NaN    57.0     0.0     17.0
2020-05-25       NaN    61.0     0.0     14.0
2020-06-10       NaN    62.0     NaN     12.0
2020-07-01       NaN    69.0     NaN      3.0
2020-10-30       NaN    74.0     NaN      2.0
2020-11-18       NaN    76.0     NaN      2.0
2021-08-26       1.0    86.0     NaN      1.0
2021-10-08       4.0    86.0     NaN      1.0
2021-11-11       4.0    84.0     NaN      1.0
2021-11-17       4.0    85.0     NaN      0.0

有趣的是,如果仔细观察,您还可以看到“牛眼”(蓝色)线实际上从图表开始就存在。

所以,这是怎么回事?是 matplotlib 或 pandas 或 something 在那里将 NaN 绘制为“零”而不是“不在这个图中?

并且

dropna
不是这里的答案:它会删除整个行或列,我需要删除cell,这在这里没有意义。

请注意,我之前使用条形图对该图进行的迭代没有这个问题:

enter image description here

只需将上面的

area
替换为
bar
即可重现。条形图的问题是它不考虑 X 轴(时间)的比例。

python pandas matplotlib nan
1个回答
0
投票

您应该将线宽设置为零:

ax = plt.subplot()
df.plot(ax=ax, kind='area', lw=0, stacked=True)

输出:

stacked area plot with NaN

© www.soinside.com 2019 - 2024. All rights reserved.