将大文本文件分成每个文件约 5 行的较小文本文件,并在行末尾找到“END”参数

问题描述 投票:0回答:1

我正在尝试将大文本文件分成较小的文本文件。理想情况下,每个较小的输出文本文件应有 5 行,但如果第 5 行中不存在“END”关键字,则应移动到下一行,直到找到“END”关键字,然后将其创建为一个较小的输出文件。

以下是我正在处理的数据:

CONTINGENCY 'P11:-12.47:DEI:PURDUE CHP GEN'
SET BUS 249831 GENERATION TO 20 MW
END
CONTINGENCY 'P11:-12.47:DEI:PURDUE TG1-2 GENS'
SET BUS 249831 GENERATION TO 15.5 MW
END
CONTINGENCY 'P11:-13.2:DEI:TATE-LYLE BTM GENS'
OPEN BUS 249936
END
CONTINGENCY 'P11:0.342:DEI:08CR_SOL_GEN:1'
REMOVE MACHINE 1 FROM BUS 251904
END

在此示例中,第 5 行没有“END”关键字,因此它应该移动到有“END”关键字的第 6 行,并创建一个 6 行的小文本文件,下一步应该从第 7 行开始并遵循相同的过程。

目前我正在使用以下代码:

import glob
import pandas as pd
import math
import os

if __name__ == "__main__":
    file_dir = os.path.dirname(__file__)
    if file_dir != "":
       os.getcwd()

read_file = glob.glob("*.con")

with open("combined.con", "wb") as outfile:
    for f in read_file:
        with open (f, "rb") as infile:
            outfile.write(infile.read())
        
df0 = pd.read_csv (file_dir + '/combined.con')```
count = len(df0)
row_range = 5
block = count // row_range
for line in df0:
    for i in range(block):   
        if not line.startswith("END"):
            start = i * row_range
            stop = (i+1) * row_range
            while True:
                row_range = row_range + 1
                df2 = df0.iloc[start:stop]
                df2.to_csv(f"Contingency_{i}.con", index=False)
                break    
Code is creating smaller text files but they are **not** ending with "END" keyword as intended.

Expected output is two smaller text files with following data:
    CONTINGENCY 'P11:-12.47:DEI:PURDUE CHP GEN'
    SET BUS 249831 GENERATION TO 20 MW
    END
    CONTINGENCY 'P11:-12.47:DEI:PURDUE TG1-2 GENS'
    SET BUS 249831 GENERATION TO 15.5 MW
    END

    CONTINGENCY 'P11:-13.2:DEI:TATE-LYLE BTM GENS'
    OPEN BUS 249936
    END
    CONTINGENCY 'P11:0.342:DEI:08CR_SOL_GEN:1'
    REMOVE MACHINE 1 FROM BUS 251904
    END
python pandas
1个回答
0
投票

我之前看到过这个问题 - 但在我创建代码之前它就被关闭了。

我不使用 pandas,我不计算块,但我逐行读取并将它们添加到单独的列表中,然后检查列表是否有 5 行或更多行,以及最后一行是否有

END

other_list = []
index = 0
 
with open("combined.con") as infile:
    for line in infile:
        other_list.append(line)
        if len(other_list) >= 5 and line.startwith('END'):
            with open(f"Contingency_{i}.con", 'w') as outfile:
                for item in other_list:
                    outfile.write(item)
            other_list = []
            index += 1
                   
# make sure there is no data
if len(other_list) > 0:
    with open(f"Contingency_{i}.con", 'w') as outfile:
        for item in other_list:
            outfile.write(item)
© www.soinside.com 2019 - 2024. All rights reserved.