我正在使用 pandas 从 Excel 工作表中读取数据,然后使用 df.iterrows() 迭代数据,进一步处理它以使用 python/selenium 自动化工作流程。
我的电子表格的每一行都属于求职者,他们的属性被捕获在不同的列中。但是,由于一个人可以拥有多个学位,因此他们的教育详细信息会记录在学位 1、专业 1、学院 1、学位 2、专业 2、学院 2 等列中。最多可以填写 5 个资格。在迭代过程中,我想突破,然后循环遍历行一个人的教育(学位、专业、大学)。
如何完成这个任务?
我附上了示例数据的 github 链接以供参考text,它基本上与下面粘贴的数据相同......
sr_no old_emp_id name address mobile degree1 specialisation1 college1 degree2 specialisation2 college2 degree3 specialisation3 college3 emp_status
0 1 24 Amit ABC Road 356363474 Computer Science Robotics IIT Delhi MSC ML MIT PHD AI Harvard full-time
1 2 34 Samit Xyz Road 367474748 Bachelor of Arts Economics Delih Univ Masters of Eco Internatioal Relation Delhi Univ PHD Foreign Trade Delhi Univ part-time
2 3 56 Richard PTC Street 363637677 Bsc Biology Mumbai Univ Masters of Science Microbiology Mumbai Univ PHD Communicable disease Mumbai Univ part-time
我尝试使用下面提供的自定义函数对其进行分组。但它没有给出预期的结果。 :
def group_attributes_diff(df):
new_data =[]
for i in range(0, len(df),35):
candidate_info={}
for j in range(i,i+7+1):
row = df.iloc[j]
degree_name = row['degree_name' + str(int(j-i)//7+1)]
specialisation = row["specialisation" + str(int(j - i)//7 + 1)] if "specialisation" + str(int(j - i)//7 + 1) in row else None
course_start_date = row["course_start_date" + str(int(j - i)//7 + 1)] if "course_start_date" + str(int(j - i)//7 + 1) in row else None
course_end_date = row["course_end_date" + str(int(j - i)//7 + 1)] if "course_end_date" + str(int(j - i)//7 + 1) in row else None
marks_grades = row["marks_grades" + str(int(j - i)//7 + 1)] if "marks_grades" + str(int(j - i)//7 + 1) in row else None
university = row["university" + str(int(j - i)//7 + 1)] if "university" + str(int(j - i)//7 + 1) in row else None
course_type = row["course_type" + str(int(j - i)//7 + 1)] if "course_type" + str(int(j - i)//7 + 1) in row else None
education={
"degree_name":degree_name,
"specialisation":specialisation,
"course_start_date":course_start_date,
"course_end_date":course_end_date,
"marks_grades":marks_grades,
"university":university,
"course_type":course_type
}
candidate_info["Education "+str(int(j-i)//7+1)] = education
new_data.append(candidate_info)
return pd.DataFrame(new_data)
test_df = group_attributes_diff(df.copy())
print(test_df.to_excel('education.xlsx'))
提前感谢您的投入! 哈里什
我建议学习数据结构以及如何旋转和逆旋转数据。这是我如何执行此操作的示例:
使用 pd 阅读 Excel 工作表后:
df = pd.read_excel(file_path)
然后对数据进行逆透视:
df_unpivoted = pd.melt(df,
id_vars=['sr_no', 'old_emp_id', 'name', 'address', 'mobile', 'emp_status'],
var_name='attribute',
value_name='value')
您将得到如下所示的结果结构:
sr_no old_emp_id name address mobile emp_status attribute value
0 1 24 Amit ABC Road 356363474 full-time degree1 Computer Science
1 2 34 Samit Xyz Road 367474748 part-time degree1 Bachelor of Arts
2 3 56 Richard PTC Street 363637677 part-time degree1 Bsc Biology
3 1 24 Amit ABC Road 356363474 full-time specialisation1 Robotics
4 2 34 Samit Xyz Road 367474748 part-time specialisation1 Economics
5 3 56 Richard PTC Street 363637677 part-time specialisation1 None
6 1 24 Amit ABC Road 356363474 full-time college1 IIT Delhi
7 2 34 Samit Xyz Road 367474748 part-time college1 Delih Univ
8 3 56 Richard PTC Street 363637677 part-time college1 Mumbai Univ
9 1 24 Amit ABC Road 356363474 full-time degree2 MSC ML
10 2 34 Samit Xyz Road 367474748 part-time degree2 Masters of Eco
11 3 56 Richard PTC Street 363637677 part-time degree2 Masters of Science
12 1 24 Amit ABC Road 356363474 full-time specialisation2 MIT
13 2 34 Samit Xyz Road 367474748 part-time specialisation2 Internatioal Relation
14 3 56 Richard PTC Street 363637677 part-time specialisation2 Microbiology
现在您可以围绕独特的属性构建逻辑。