我有一个数据框,有2列:genre和release_year。每年都有多种类型。格式如下:
genre release_year
Action 2015
Action 2015
Adventure 2015
Action 2015
Action 2015
我需要使用Pandas / Python绘制所有类型的变化。
df = pd.read('genres.csv')
df.shape
(53975, 2)
df_new = df.groupby(['release_year', 'genre'])['genre'].count()
这导致以下分组。
release_year genre
1960 Action 8
Adventure 5
Comedy 8
Crime 2
Drama 13
Family 3
Fantasy 2
Foreign 1
History 5
Horror 7
Music 1
Romance 6
Science Fiction 3
Thriller 6
War 2
Western 6
1961 Action 7
Adventure 6
Animation 1
Comedy 10
Crime 2
Drama 16
Family 5
Fantasy 2
Foreign 1
History 3
Horror 3
Music 2
Mystery 1
Romance 7
...
我需要绘制几年来流派特征变化的线图。即我必须有一个循环,这可以帮助我绘制多年来每种类型的情节。例如,
df_action = df.query('genre == "Action"')
result_plot = df_action.groupby(['release_year','genre'])['genre'].count()
result_plot.plot(figsize=(10,10));
显示了“动作”类型的情节。同样地,我不需要为每个类型分别绘制,而是需要有一个相同的循环。
我怎样才能做到这一点?有人可以帮我这个吗?
我尝试了以下但它不起作用。
genres = ["Action", "Adventure", "Western", "Science Fiction", "Drama",
"Family", "Comedy", "Crime", "Romance", "War", "Mystery",
"Thriller", "Fantasy", "History", "Animation", "Horror", "Music",
"Documentary", "TV Movie", "Foreign"]
for g in genres:
#df_new = df.query('genre == "g"')
result_plot = df.groupby(['release_year','genre'])['genre'].count()
result_plot.plot(figsize=(10,10));
如何解开你的系列并在一个命令中绘制所有内容:
In [36]: s
Out[36]:
release_year genre
1960.0 Action 8
Adventure 5
Comedy 8
Crime 2
Drama 13
Family 3
Fantasy 2
Foreign 1
History 5
Horror 7
..
1961.0 Crime 2
Drama 16
Family 5
Fantasy 2
Foreign 1
History 3
Horror 3
Music 2
Mystery 1
Romance 7
Name: count, Length: 30, dtype: int64
In [37]: s.unstack()
Out[37]:
genre Action Adventure Animation Comedy Crime Drama Family Fantasy Foreign History Horror Music Mystery Romance \
release_year
1960.0 8.0 5.0 NaN 8.0 2.0 13.0 3.0 2.0 1.0 5.0 7.0 1.0 NaN 6.0
1961.0 7.0 6.0 1.0 10.0 2.0 16.0 5.0 2.0 1.0 3.0 3.0 2.0 1.0 7.0
genre Science Fiction Thriller War Western
release_year
1960.0 3.0 6.0 2.0 6.0
1961.0 NaN NaN NaN NaN
绘图:
s.unstack().plot()
df_new.unstack().T.plot(kind='bar')
我选择了条形图,你可以改为你需要的what ever
PS:你可以考虑crosstab
而不是groupby
pd.crosstab(df.genre,df.release_year).plot(kind='bar')