我正在尝试重新索引我的数据框,以便根据其所属的类类型为每一行提供一个值。我基本上是使用数字对它们进行分类的,这样以后以后我将更容易访问它们。
pandas.set_option('display.max_columns', None)
d = pd.read_html("https://www.bu.edu/phpbin/course-search/section/?t=casma124")
d = pd.concat(d)
number_of_rows = 1 #number of rows in dataframe
index_range = list(range(number_of_rows))
d = d.loc[:, ["Section", "Type","Schedule", "Location"]]
print(d)
此代码的输出如下:
Section Type Schedule Location
0 A1 LEC MWF 1:25 pm-2:15 pm STO B50
1 A1 NaN R 6:30 pm-8:30 pm ROOM
2 A2 LEC MWF 12:20 pm-1:10 pm STO B50
3 A2 NaN R 6:30 pm-8:30 pm ROOM
4 A3 LEC TR 12:30 pm-1:45 pm STO B50
5 A3 NaN R 6:30 pm-8:30 pm ROOM
6 B1 DIS T 2:00 pm-3:15 pm EPC 207
7 B2 DIS T 3:30 pm-4:45 pm EPC 207
8 B3 DIS T 5:00 pm-6:15 pm EPC 207
9 B4 DIS R 2:00 pm-3:15 pm EPC 207
10 B5 DIS M 2:30 pm-3:45 pm CAS 324
11 B6 DIS W 2:30 pm-3:45 pm CAS 324
12 B7 DIS R 3:30 pm-4:45 pm EPC 207
0 SA1 IND MTWR 1:00 pm-3:00 pm MCS B29
1 SA2 IND MTR 6:00 pm-8:30 pm COM 217
0 SB1 IND MTWR 11:00 am-1:00 pm PSY B51
1 SB2 IND MTR 6:00 pm-8:30 pm PSY B37
0 A1 LEC MWF 11:15 am-12:05 pm STO
1 A1 NaN R 6:30 pm-8:30 pm NaN
2 A2 LEC MWF 2:30 pm-3:20 pm STO
3 A2 NaN R 6:30 pm-8:30 pm NaN
4 A3 LEC TR 8:00 am-9:15 am STO
5 A3 NaN R 6:30 pm-8:30 pm NaN
6 B1 DIS M 4:30 pm-5:45 pm NaN
7 B2 DIS T 12:30 pm-1:45 pm NaN
8 B3 DIS T 3:30 pm-4:45 pm NaN
9 B4 DIS W 8:30 am-9:45 am CAS
10 B5 DIS W 4:30 pm-5:45 pm NaN
11 B6 DIS R 12:30 pm-1:45 pm NaN
我希望它看起来像这样:
Section Type Schedule Location
1 A1 LEC MWF 1:25 pm-2:15 pm STO B50
2 A1 NaN R 6:30 pm-8:30 pm ROOM
1 A2 LEC MWF 12:20 pm-1:10 pm STO B50
2 A2 NaN R 6:30 pm-8:30 pm ROOM
1 A3 LEC TR 12:30 pm-1:45 pm STO B50
2 A3 NaN R 6:30 pm-8:30 pm ROOM
3 B1 DIS T 2:00 pm-3:15 pm EPC 207
3 B2 DIS T 3:30 pm-4:45 pm EPC 207
3 B3 DIS T 5:00 pm-6:15 pm EPC 207
3 B4 DIS R 2:00 pm-3:15 pm EPC 207
3 B5 DIS M 2:30 pm-3:45 pm CAS 324
3 B6 DIS W 2:30 pm-3:45 pm CAS 324
3 B7 DIS R 3:30 pm-4:45 pm EPC 207
9 SA1 IND MTWR 1:00 pm-3:00 pm MCS B29
9 SA2 IND MTR 6:00 pm-8:30 pm COM 217
9 SB1 IND MTWR 11:00 am-1:00 pm PSY B51
9 SB2 IND MTR 6:00 pm-8:30 pm PSY B37
1 A1 LEC MWF 11:15 am-12:05 pm STO
2 A1 NaN R 6:30 pm-8:30 pm NaN
1 A2 LEC MWF 2:30 pm-3:20 pm STO
2 A2 NaN R 6:30 pm-8:30 pm NaN
1 A3 LEC TR 8:00 am-9:15 am STO
2 A3 NaN R 6:30 pm-8:30 pm NaN
3 B1 DIS M 4:30 pm-5:45 pm NaN
3 B2 DIS T 12:30 pm-1:45 pm NaN
3 B3 DIS T 3:30 pm-4:45 pm NaN
3 B4 DIS W 8:30 am-9:45 am CAS
3 B5 DIS W 4:30 pm-5:45 pm NaN
3 B6 DIS R 12:30 pm-1:45 pm NaN
这样,只要类型为LEC(讲座),索引将为1。对于NAN,索引将为2。DIS将为等...
我已经尝试过像这样重新索引,但出现错误。
d.reset_index()
d.reindex(index= range(len(d)))
我问了一个类似的问题,关于根据特定列中的条件访问并在单独的工作表中存储行。您可能会发现Manish Chaudhary在这里对我的问题的回答很有帮助:
Using Openpyxl to create multiple custom spreadsheets
最终,我放弃了使用Openpyxl,而使用熊猫演示了该任务。