Python:按多列分组的值的线图

问题描述 投票:2回答:3

我有一个数据框,有2列:genre和release_year。每年都有多种类型。格式如下:

genre   release_year
Action  2015
Action  2015
Adventure   2015
Action  2015
Action  2015

我需要使用Pandas / Python绘制所有类型的变化。

df = pd.read('genres.csv')

df.shape
(53975, 2)


df_new = df.groupby(['release_year', 'genre'])['genre'].count()

这导致以下分组。

release_year  genre          
1960      Action               8
          Adventure            5
          Comedy               8
          Crime                2
          Drama               13
          Family               3
          Fantasy              2
          Foreign              1
          History              5
          Horror               7
          Music                1
          Romance              6
          Science Fiction      3
          Thriller             6
          War                  2
          Western              6
1961      Action               7
          Adventure            6
          Animation            1
          Comedy              10
          Crime                2
          Drama               16
          Family               5
          Fantasy              2
          Foreign              1
          History              3
          Horror               3
          Music                2
          Mystery              1
          Romance              7
                            ... 

我需要绘制几年来流派特征变化的线图。即我必须有一个循环,这可以帮助我绘制多年来每种类型的情节。例如,

df_action = df.query('genre == "Action"')
result_plot = df_action.groupby(['release_year','genre'])['genre'].count()
result_plot.plot(figsize=(10,10));

显示了“动作”类型的情节。同样地,我不需要为每个类型分别绘制,而是需要有一个相同的循环。

我怎样才能做到这一点?有人可以帮我这个吗?

我尝试了以下但它不起作用。

genres = ["Action", "Adventure", "Western", "Science Fiction", "Drama",
   "Family", "Comedy", "Crime", "Romance", "War", "Mystery",
   "Thriller", "Fantasy", "History", "Animation", "Horror", "Music",
   "Documentary", "TV Movie", "Foreign"]

for g in genres:
    #df_new = df.query('genre == "g"')
    result_plot = df.groupby(['release_year','genre'])['genre'].count()
    result_plot.plot(figsize=(10,10));
python pandas matplotlib plot
3个回答
2
投票

如何解开你的系列并在一个命令中绘制所有内容:

In [36]: s
Out[36]:
release_year  genre
1960.0        Action        8
              Adventure     5
              Comedy        8
              Crime         2
              Drama        13
              Family        3
              Fantasy       2
              Foreign       1
              History       5
              Horror        7
                           ..
1961.0        Crime         2
              Drama        16
              Family        5
              Fantasy       2
              Foreign       1
              History       3
              Horror        3
              Music         2
              Mystery       1
              Romance       7
Name: count, Length: 30, dtype: int64

In [37]: s.unstack()
Out[37]:
genre         Action  Adventure  Animation  Comedy  Crime  Drama  Family  Fantasy  Foreign  History  Horror  Music  Mystery  Romance  \
release_year
1960.0           8.0        5.0        NaN     8.0    2.0   13.0     3.0      2.0      1.0      5.0     7.0    1.0      NaN      6.0
1961.0           7.0        6.0        1.0    10.0    2.0   16.0     5.0      2.0      1.0      3.0     3.0    2.0      1.0      7.0

genre         Science Fiction  Thriller  War  Western
release_year
1960.0                    3.0       6.0  2.0      6.0
1961.0                    NaN       NaN  NaN      NaN

绘图:

s.unstack().plot()

2
投票
df_new.unstack().T.plot(kind='bar')

我选择了条形图,你可以改为你需要的what ever

PS:你可以考虑crosstab而不是groupby

pd.crosstab(df.genre,df.release_year).plot(kind='bar')

enter image description here


0
投票

我建议使用seaborn,它有助于避免在绘图之前操纵数据帧。您可以通过运行pip install seaborn来安装它。它有一个简单的API用于标准种类的图:

release_year vs genre

import seaborn as sns
sns.countplot(x='release_year', hue='genre', data=df)

release_year vs genre

genre vs release_year

import seaborn as sns
sns.countplot(x='genre', hue='release_year', data=df)

genre vs release_year

© www.soinside.com 2019 - 2024. All rights reserved.