我正在做这个实践考试,当我提交时,我注意到我有几个点不正确,我不知道为什么。
这是任务 1:
我的实现是:
import pandas as pd
clean_data = pd.read_csv('ebike_data.csv')
clean_data['bike_type'] = clean_data['bike_type'].fillna('standard')
clean_data['frame_material'] = clean_data['frame_material'].fillna('unknown')
clean_data['frame_material'] = clean_data['frame_material'].str.lower()
clean_data['production_cost'] = clean_data['production_cost'].fillna(clean_data['production_cost'].median()).astype(float)
clean_data['assembly_time'] = clean_data['assembly_time'].fillna(clean_data['assembly_time'].mean()).astype(int)
clean_data['top_speed'] = clean_data['top_speed'].fillna(clean_data['top_speed'].mean()).astype(int)
clean_data['battery_type'] = clean_data['battery_type'].fillna('other')
clean_data['battery_type'] = clean_data['battery_type'].replace({'-':'other', 'liotherion': 'li-ion'})
clean_data['customer_score'] = clean_data['customer_score'].fillna(clean_data['customer_score'].mean()).clip(lower=1, upper=10).astype(int)
clean_data['motor_power'] = clean_data['motor_power'].str.replace('W','').astype(float)
clean_data['motor_power'] = clean_data['motor_power'].fillna(clean_data['motor_power'].median()).astype(int)
print(clean_data.info())
print(clean_data.isna().sum())
输出:
angeIndex: 2000 entries, 0 to 1999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bike_type 2000 non-null object
1 frame_material 2000 non-null object
2 production_cost 2000 non-null float64
3 assembly_time 2000 non-null int64
4 top_speed 2000 non-null int64
5 battery_type 2000 non-null object
6 motor_power 2000 non-null int64
7 customer_score 2000 non-null int64
dtypes: float64(1), int64(4), object(3)
memory usage: 125.1+ KB
None
bike_type 0
frame_material 0
production_cost 0
assembly_time 0
top_speed 0
battery_type 0
motor_power 0
customer_score 0
dtype: int64
当我提交项目时,这是我得到的反馈:
所有必需的数据已创建并具有所需的列 - 检查 任务 1:识别并替换缺失值 - 未检查 任务 1:在数据类型之间转换值 - 检查 任务 1:通过操作字符串清理分类和文本数据 - CHECK
我今天解决了这个“难题”。
当您将数据帧导出为 CSV 文件以检查结果时,这很有帮助,又名 clean_data.to_csv("result.csv")
棘手的部分是:
在继续本专栏之前,将“STEel”替换为“steel”: prod_df["frame_material"].replace("STEel","钢")
对于所有缺失的字符串值,请使用否定 isin 函数,fo: ~prod_df["bike_type"].isin(['标准', '折叠', '山地', '道路']), "bike_type"] = "标准"
最高速度列平均值按 2 个小数点舍入: prod_df["top_speed"].fillna(prod_df["top_speed"].mean().round(2), inplace=True)
我希望这有帮助!