迭代 pandas 数据框的行

Question

我正在使用 pandas 从 Excel 工作表中读取数据，然后使用 df.iterrows() 迭代数据，进一步处理它以使用 python/selenium 自动化工作流程。

我的电子表格的每一行都属于求职者，他们的属性被捕获在不同的列中。但是，由于一个人可以拥有多个学位，因此他们的教育详细信息会记录在学位 1、专业 1、学院 1、学位 2、专业 2、学院 2 等列中。最多可以填写 5 个资格。在迭代过程中，我想突破，然后循环遍历行一个人的教育（学位、专业、大学）。

如何完成这个任务？

我附上了示例数据的 github 链接以供参考text，它基本上与下面粘贴的数据相同......

sr_no   old_emp_id  name    address mobile  degree1 specialisation1 college1    degree2 specialisation2 college2    degree3 specialisation3 college3    emp_status
0   1   24  Amit    ABC Road    356363474   Computer Science    Robotics    IIT Delhi   MSC ML  MIT PHD AI  Harvard full-time
1   2   34  Samit   Xyz Road    367474748   Bachelor of Arts    Economics   Delih Univ  Masters of Eco  Internatioal Relation   Delhi Univ  PHD Foreign Trade   Delhi Univ  part-time
2   3   56  Richard PTC Street  363637677   Bsc Biology Mumbai Univ Masters of Science  Microbiology    Mumbai Univ PHD Communicable disease    Mumbai Univ part-time

我尝试使用下面提供的自定义函数对其进行分组。但它没有给出预期的结果。 :

def group_attributes_diff(df):
    new_data =[]
    for i in range(0, len(df),35):
        candidate_info={}
        for j in range(i,i+7+1):
            row = df.iloc[j]
            
            degree_name = row['degree_name' + str(int(j-i)//7+1)]
            specialisation = row["specialisation" + str(int(j - i)//7 + 1)] if "specialisation" + str(int(j - i)//7 + 1) in row else None
            course_start_date = row["course_start_date" + str(int(j - i)//7 + 1)] if "course_start_date" + str(int(j - i)//7 + 1) in row else None
            course_end_date = row["course_end_date" + str(int(j - i)//7 + 1)] if "course_end_date" + str(int(j - i)//7 + 1) in row else None
            marks_grades = row["marks_grades" + str(int(j - i)//7 + 1)] if "marks_grades" + str(int(j - i)//7 + 1) in row else None
            university = row["university" + str(int(j - i)//7 + 1)] if "university" + str(int(j - i)//7 + 1) in row else None
            course_type = row["course_type" + str(int(j - i)//7 + 1)] if "course_type" + str(int(j - i)//7 + 1) in row else None
            education={
             "degree_name":degree_name,
                "specialisation":specialisation,
                "course_start_date":course_start_date,
                    "course_end_date":course_end_date,
                    "marks_grades":marks_grades,
                    "university":university,
                    "course_type":course_type
                            }
        
            candidate_info["Education "+str(int(j-i)//7+1)] = education
        new_data.append(candidate_info)
    return pd.DataFrame(new_data)

test_df = group_attributes_diff(df.copy())

print(test_df.to_excel('education.xlsx'))

提前感谢您的投入！哈里什

Answer 1

我建议学习数据结构以及如何旋转和逆旋转数据。这是我如何执行此操作的示例：

使用 pd 阅读 Excel 工作表后：

df = pd.read_excel(file_path)

然后对数据进行逆透视：

df_unpivoted = pd.melt(df, 
                       id_vars=['sr_no', 'old_emp_id', 'name', 'address', 'mobile', 'emp_status'],
                       var_name='attribute', 
                       value_name='value')

您将得到如下所示的结果结构：

sr_no   old_emp_id  name    address mobile  emp_status  attribute   value
0   1   24  Amit    ABC Road    356363474   full-time   degree1 Computer Science
1   2   34  Samit   Xyz Road    367474748   part-time   degree1 Bachelor of Arts
2   3   56  Richard PTC Street  363637677   part-time   degree1 Bsc Biology
3   1   24  Amit    ABC Road    356363474   full-time   specialisation1 Robotics
4   2   34  Samit   Xyz Road    367474748   part-time   specialisation1 Economics
5   3   56  Richard PTC Street  363637677   part-time   specialisation1 None
6   1   24  Amit    ABC Road    356363474   full-time   college1    IIT Delhi
7   2   34  Samit   Xyz Road    367474748   part-time   college1    Delih Univ
8   3   56  Richard PTC Street  363637677   part-time   college1    Mumbai Univ
9   1   24  Amit    ABC Road    356363474   full-time   degree2 MSC ML
10  2   34  Samit   Xyz Road    367474748   part-time   degree2 Masters of Eco
11  3   56  Richard PTC Street  363637677   part-time   degree2 Masters of Science
12  1   24  Amit    ABC Road    356363474   full-time   specialisation2 MIT
13  2   34  Samit   Xyz Road    367474748   part-time   specialisation2 Internatioal Relation
14  3   56  Richard PTC Street  363637677   part-time   specialisation2 Microbiology

现在您可以围绕独特的属性构建逻辑。

迭代 pandas 数据框的行

问题描述投票：0回答：1

1个回答

最新问题

迭代 pandas 数据框的行

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1