我正在通过 python 读取 excel 工作表,并尝试仅读取 python 中的可见行(未隐藏或折叠)。我浏览了 OPENPYXL 的文档,发现它具有“隐藏”和“折叠”属性。但是,一旦我阅读 Excel 工作表,当隐藏列时,并不总是“隐藏”或“折叠”为真。我的代码如下
def read_visible_data_from_sheet(sheet):
data = []
# Iterate through rows
for row in sheet.iter_rows():
row_num = row[0].row
# Check if the row is hidden or has height set to 0
row_hidden = sheet.row_dimensions[row_num].hidden
row_height = sheet.row_dimensions[row_num].height
row_level = sheet.row_dimensions[row_num].outlineLevel
if row_hidden or (row_height is not None and row_height == 0):
continue # Skip hidden rows
# Check if any parent row is collapsed
is_collapsed = False
for parent_row_num in range(1, row_num):
if sheet.row_dimensions[parent_row_num].outlineLevel < row_level and sheet.row_dimensions[parent_row_num].hidden == True:
is_collapsed = True
break
if is_collapsed:
continue # Skip collapsed rows
visible_row = []
# Iterate through columns in the row
for cell in row:
col_letter = cell.column_letter
col_dim = sheet.column_dimensions.get(col_letter)
if col_dim:
col_hidden = col_dim.hidden
col_width = col_dim.width
else:
continue
# Check if the column is hidden or has width set to 0
if col_hidden or (col_width is not None and col_width == 0):
continue # Skip hidden columns
visible_row.append(cell.value)
# Append the visible row to the data list
if visible_row: # Avoid adding empty rows
data.append(visible_row)
# Convert to a DataFrame
df = pd.DataFrame(data)
return df, sheet
在某些情况下,sheet.column_dimensions 包含可见的列,而在其他情况下,sheet.column_dimensions 中不包含可见的列。
有没有更好的方法来处理此类情况?如果有必要,我愿意探索任何其他图书馆。
Openpyxl 处理行和列信息的方式可能存在一些不一致,但是在搜索行或列的详细信息时需要记住一些事项
如示例中所述,由于列状态不会更改,因此不断检查要复制的每一行的状态似乎很浪费。
检查列一次制作列表,然后在循环浏览行时使用该列表。
以下示例执行此操作;
函数
find_visible
仅检查第 1 行中的可见列。column_group
列表中的列以及它们是否隐藏由此创建可见列的列表
下一个函数
read_visible_data_from_sheet
然后使用此列表,并在检查该行是否可见后,从每个可见列列表的可见列中复制值。row_dimensions
中的隐藏状态确定为可见
工作表具有以下分组和隐藏的行和列;
"""
Hidden Rows & Columns
Columns;
C, width was set to 0
E & G, Hidden
J - L, Grouped, Collapsed
N & O, Grouped, Not collapsed
Rows;
7, Height set to 0
15 - 19, Grouped, Collapsed
31 & 32, Hidden
35 - 37, Grouped, Not collapsed
"""
代码示例
import pandas as pd
import openpyxl
from openpyxl.utils.cell import range_boundaries as rb
def find_visible(sheet):
vis_cols = [] # List of columns that are visible
hidden_grouped_cols = []
for range_string in sheet.column_groups:
group_collapsed = sheet.column_dimensions[range_string[:1]].hidden
rng_boundaries = rb(range_string)
hidden_grouped_cols += [x for x in range(rng_boundaries[0], rng_boundaries[2]+1) if group_collapsed]
for row in sheet.iter_rows(max_row=1):
for col in row:
current_column = col.column_letter
col_info = sheet.column_dimensions[current_column]
if not col_info.hidden and col.col_idx not in hidden_grouped_cols:
vis_cols.append(col.column_letter)
return vis_cols
def read_visible_data_from_sheet(sheet, columns):
data = []
# Iterate through rows
for row in sheet.iter_rows():
# Check if the row is hidden
if not sheet.row_dimensions[row[0].row].hidden:
visible_row = [cell.value for cell in row if cell.column_letter in columns]
if visible_row: # Avoid adding empty rows
data.append(visible_row)
dataframe = pd.DataFrame(data)
return dataframe
excelfile = 'foo.xlsx'
wb = openpyxl.load_workbook(excelfile)
ws = wb['Sheet1']
visible_columns = find_visible(ws)
print(visible_columns)
df = read_visible_data_from_sheet(ws, visible_columns)
print(df)