当我尝试重新排列计数阈值时,出现错误。你能帮我吗?
dfff["employee_count"].value_counts()
employee_count
1-10 56091
11-50 55892
51-100 14377
101-250 12217
251-500 6187
501-1000 4384
1001-5000 4080
unknown 3335
10000+ 1941
5001-10000 1362
Name: count, dtype: int64
employee_count_unknown = dfff[(dfff['employee_count'] == 'unknown')].index
dfff.drop(employee_count_unknown, inplace=True)
dfff.loc[df['employee_count'].isin( ["251-500", "501-1000"]),
["employee_count"]] = "251-1000"
dfff.loc[df['employee_count'].isin( ["1001-5000", "5001-10000", "10000+"]),
["employee_count"]] = "1001+"
TypeError: unhashable type: 'Series'
您的代码在我的情况下运行得很好:
import pandas as pd
data = ["1-10","11-50","51-100","101-250","251-500","501-1000","1001-5000","unknown","10000+","5001-10000"]*10
df = pd.DataFrame(data=data, columns=["employee_count"])
employee_count_unknown = df[(df['employee_count'] == 'unknown')].index
df.drop(employee_count_unknown, inplace=True)
df.loc[df['employee_count'].isin( ["251-500", "501-1000"]), ["employee_count"]] = "251-1000"
df.loc[df['employee_count'].isin( ["1001-5000", "5001-10000", "10000+"]), ["employee_count"]] = "1001+"
print(df["employee_count"].value_counts()
输出:
员工人数 | 数 |
---|---|
1001+ | 30 |
251-1000 | 20 |
1-10 | 10 |
11-50 | 10 |
51-100 | 10 |
101-250 | 10 |