我有一个名为df
的以下数据框,其中包含2列:
In [4]: df.head(20)
Out[4]:
age age_band
0 NaN NaN
1 61.0 55-64
2 NaN NaN
3 55.0 55-64
4 NaN NaN
5 67.0 65+
6 NaN NaN
7 20.0 18-24
8 53.0 45-54
9 NaN NaN
10 NaN NaN
11 23.0 18-24
12 60.0 55-64
13 NaN NaN
14 54.0 45-54
15 NaN NaN
16 67.0 65+
17 NaN NaN
18 50.0 45-54
19 70.0 65+
In [5]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107632 entries, 0 to 107631
Data columns (total 2 columns):
age 73289 non-null float64
age_band 73289 non-null object
dtypes: float64(1), object(1)
memory usage: 1.6+ MB
In [7]: df["age_band"].value_counts()
Out[7]:
45-54 22461
55-64 17048
35-44 14582
65+ 12990
25-34 4078
18-24 2130
Name: age_band, dtype: int64
In [8]: df["age"].min()
Out[8]: 19.0
In [9]: df["age"].max()
Out[9]: 74.0
AIM:我想使用hvplot绘制df["age"]
的直方图。在此绘图中,我想将年龄放入与df["age_band"]
列值相对应的bin中。下图说明了这一点:
In [10]: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],hover_cols
...: =["age_band"],line_width=4,line_color="w")
当您将鼠标悬停在每个bin上时,每个age_band
的计数正确显示为Count
,但是似乎不是每个age band
值,而是显示每个bin的平均值或中位数age
。
进一步研究发现,设置hover_cols="age_band"
实际上对绘图没有任何影响(如果省略,则会得到相同的绘图。)
然后我尝试使用HoverTool:
In [11]: from bokeh.models import HoverTool
...:
...: hover = HoverTool(tooltips=df["age_band"].dropna())
...:
...: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],line_width
...: =4,line_color="w").opts(tools=[hover])
但是我遇到以下错误:
ValueError: expected an element of either String or List(Tuple(String, String)), got 1 55-64
所以我尝试了:
In [12]: from bokeh.models import HoverTool
...:
...: hover = HoverTool(tooltips="age_band")
...:
...: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],line_wi
...: dth=4,line_color="w").opts(tools=[hover])
导致的结果:
所以我也尝试了:
In [13]: hover = HoverTool(tooltips=[("18-24","2130"),("25-34","4078"),("35-44","14582"),("45-54",
...: "22461"),("55-64","17048"),("65+","12990")])
...:
...: df.hvplot.hist("age",bins=[18,25,35,45,55,65,74],xticks=[18,25,35,45,55,65,74],line_width
...: =4,line_color="w").opts(tools=[hover])
导致以下结果:
[是否有一种方法可以使用hvplot.hist生成df["age"]
的直方图,当您将鼠标悬停在垃圾箱上时,会看到age_band
的相应Count
和age_band
?
谢谢
设置by = ['age_band']应该可以工作,并且在悬停时应该向您显示该列:
df.hvplot.hist(
y='age',
by=['age_band'],
legend=False,
color='lightblue',
bins=[18,25,35,45,55,65,74],
xticks=[18,25,35,45,55,65,74],
)
尽管在您描述的情况下,您也可以选择在value_counts上创建条形图:
age_band_counts = df['age_band'].value_counts().sort_index()
age_band_counts.hvplot.bar(bar_width=1.0)