我有一个 Excel 文件,其中一些单元格被合并,一些单元格中包含包含 Strike 和 Non Strike 的扭曲文本。
我想将非罢工记录的数据写入一个Excel,将其他罢工记录与单元格中的罢工一起写入另一个Excel。
我的输出1:
col1 | col2 | col3 | col4 |col5 |
Sampletext1 | Combines TextC2 |Sample3 | text4 |text5 |
Sampletext2 | Combines TextC2 |Sample3_1 | text4_1 |text5 |
Sampletext2 | Combines TextC2 |Sample3_1 | text4_2 |text5 |
我的输出2:
col1 | col2 | col3 | col4 |col5 |
Sampletext2 | Combines TextC2 |Sample3_1 | text4_3 |text5 |
'text4_3` 也应该在输出文件中删除。
我尝试在Python中使用
openpyxl
from openpyxl import load_workbook
from openpyxl import Workbook
import pandas as pd
input_file = 'myexecel.xslx'
Workbook = load_workbook(input_file)
for i in Workbook.worksheets:
if i.sheet_state == "visible":
sheetname = str(i).replace('<Worksheet "', '').replace('">', '').strip()
ws = Workbook[sheetname]
#unmerging the cells
for merged_cell in ws.merged_cells:
min_row, min_col, max_row, max_col = merged_cell.min_row, merged_cell.min_col, merged_cell.max_row, merged_cell.max_col
data = ws.cell(row=min_row, column=min_col).value
ws.unmerge_cells(start_row=min_row, start_column=min_col, end_row=max_row, end_column=max_col)
for row in ws.iter_rows(min_row=min_row, min_col=min_col, max_row=max_row, max_col=max_col):
for cell in row:
cell.value = data
data_all = [[cell for cell in row] for row in ws.iter_rows(values_only=True)]
df_raw = pd.DataFrame(data_all[1:], columns=headercols)
df_raw["strike_flag"] = [any(cell.font.strikethrough for cell in row) for row in ws.iter_rows(min_row=2)]
通过上面的代码,我能够找到单元格是否有删除线。但不知道如何区分删除记录和非删除记录。
由于您没有使用
openpyxl.Workbook
,因此不需要导入它,但是变量最好是小写,但尽管它可能不会导致问题,但不要使用导入作为变量。在循环合并单元格列表时修改合并单元格很可能会返回错误。首先获取单元格列表,然后循环列表,如图所示。
列表
workbook.worksheets
包含工作簿中的所有工作表,因此对于行 for i in Workbook.worksheets:
'i' 是一个工作表。如果有必要,您可以检查它是否可见,然后只需使用 i,无需提取工作表名称并使用提取的名称重新分配新变量。
最后,您可以创建两个数据帧,其中一个“strike_flag”为 True,另一个为 False。
from openpyxl import load_workbook
import pandas as pd
from openpyxl.utils import range_boundaries
headercols = ['col1', 'col2', 'col3', 'col4', 'col5']
input_file = 'myexecel.xlsx'
workbook = load_workbook(input_file)
for ws in workbook.worksheets:
if ws.sheet_state != "visible":
continue
else:
merge_list = [merge for merge in ws.merged_cells.ranges]
for merged_cell in merge_list:
min_col, min_row, max_col, max_row = range_boundaries(merged_cell.coord)
data = ws.cell(row=min_row, column=min_col).value
ws.unmerge_cells(start_row=min_row, start_column=min_col, end_row=max_row, end_column=max_col)
for row in ws.iter_rows(min_row=min_row, min_col=min_col, max_row=max_row, max_col=max_col):
for cell in row:
cell.value = data
data_all = [[cell for cell in row] for row in ws.iter_rows(values_only=True)]
df_raw = pd.DataFrame(data_all[1:], columns=headercols)
df_raw["strike_flag"] = [any(cell.font.strikethrough for cell in row) for row in ws.iter_rows(min_row=2)]
df_strikethrough = df_raw[df_raw.strike_flag]
df_no_strikethrough = df_raw[~df_raw.strike_flag]