我正在做应用数据科学的作业。
问题: 将可再生能源百分比削减为 5 个类别。按大陆划分的前 15 名组,以及这些新的可再生百分比垃圾箱。每个组中有多少个国家? 此函数应返回一个具有 Continent MultiIndex 的系列,然后是可再生百分比的 bin。请勿包含没有国家/地区的团体。
这是我的代码:
def answer_twelve():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Continent'] = Top15.index.to_series().map(ContinentDict)
Top15['bins'] = pd.cut(Top15['% Renewable'],5)
return pd.Series(Top15.groupby(by = ['Continent', 'bins']).size())#,apply(lambda x:s if x['Rank']==0 continue))
answer_twelve()
这是我对上述代码的输出
Continent bins
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 0
Australia (2.212, 15.753] 1
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 0
Europe (2.212, 15.753] 1
(15.753, 29.227] 3
(29.227, 42.701] 2
(42.701, 56.174] 0
(56.174, 69.648] 0
North America (2.212, 15.753] 1
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 1
South America (2.212, 15.753] 0
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 1
dtype: int64
所需输出为
Continent bins
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (2.212, 15.753] 1
(15.753, 29.227] 3
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Countries, dtype: int64
我想去掉零,我尝试使用
pd.Series(Top15.groupby(by = ['Continent', 'bins']).size().apply(lambda x:s if x['Rank']==0 continue))
但我不断收到以下错误
File "<ipython-input-317-14bc05bb2137>", line 20
return pd.Series(Top15.groupby(by = ['Continent', 'bins']).size().apply(lambda x:s if x['Rank']==0 continue))
^
SyntaxError: invalid syntax
我无法找出我的错误。请帮助我!
使用pandas,当列为零时删除行
如果column_name是您的列:
df = df[df.column_name != 0]
lambda x:s if x['Rank']==0 continue
这没有任何意义,因为
continue
仅在循环内有用。
请注意,您需要一个要打印的值。 相反,将其留空:
lambda x:"" if x['Rank']==0 else s
您可以使用“for”循环迭代这些值,然后使用
replace()
将 0 替换为 NaN,
现在您可以使用 dropna()
删除它们。
我尝试使用 drop()
或 droplevel()
而不是替换它们,但它不起作用。这是我的代码:
for k,i in series_df.items():
if i == 0:
pd_series.replace(to_replace=i, value=np.nan, inplace=True)
pd_series.dropna(axis=0, inplace=True)
print(pd_series)
您可能需要更改结果的数据类型。输出为:
Continent bins
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (2.212, 15.753] 1
(15.753, 29.227] 3
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
dtype: int64
由于您的最终结果是一个系列,因此您需要替换
return pd.Series(Top15.groupby(by = ['Continent', 'bins']).size())
与
temp_df=Top15.groupby(by = ['Continent', 'bins']).size()
return temp_df[temp_df != 0]