将大文本文件分成每个文件约 5 行的较小文本文件，并在行末尾找到“END”参数

Question

我正在尝试将大文本文件分成较小的文本文件。理想情况下，每个较小的输出文本文件应有 5 行，但如果第 5 行中不存在“END”关键字，则应移动到下一行，直到找到“END”关键字，然后将其创建为一个较小的输出文件。

以下是我正在处理的数据：

CONTINGENCY 'P11:-12.47:DEI:PURDUE CHP GEN'
SET BUS 249831 GENERATION TO 20 MW
END
CONTINGENCY 'P11:-12.47:DEI:PURDUE TG1-2 GENS'
SET BUS 249831 GENERATION TO 15.5 MW
END
CONTINGENCY 'P11:-13.2:DEI:TATE-LYLE BTM GENS'
OPEN BUS 249936
END
CONTINGENCY 'P11:0.342:DEI:08CR_SOL_GEN:1'
REMOVE MACHINE 1 FROM BUS 251904
END

在此示例中，第 5 行没有“END”关键字，因此它应该移动到有“END”关键字的第 6 行，并创建一个 6 行的小文本文件，下一步应该从第 7 行开始并遵循相同的过程。

目前我正在使用以下代码：

import glob
import pandas as pd
import math
import os

if __name__ == "__main__":
    file_dir = os.path.dirname(__file__)
    if file_dir != "":
       os.getcwd()

read_file = glob.glob("*.con")

with open("combined.con", "wb") as outfile:
    for f in read_file:
        with open (f, "rb") as infile:
            outfile.write(infile.read())
        
df0 = pd.read_csv (file_dir + '/combined.con')```
count = len(df0)
row_range = 5
block = count // row_range
for line in df0:
    for i in range(block):   
        if not line.startswith("END"):
            start = i * row_range
            stop = (i+1) * row_range
            while True:
                row_range = row_range + 1
                df2 = df0.iloc[start:stop]
                df2.to_csv(f"Contingency_{i}.con", index=False)
                break

Code is creating smaller text files but they are **not** ending with "END" keyword as intended.

Expected output is two smaller text files with following data:
    CONTINGENCY 'P11:-12.47:DEI:PURDUE CHP GEN'
    SET BUS 249831 GENERATION TO 20 MW
    END
    CONTINGENCY 'P11:-12.47:DEI:PURDUE TG1-2 GENS'
    SET BUS 249831 GENERATION TO 15.5 MW
    END

    CONTINGENCY 'P11:-13.2:DEI:TATE-LYLE BTM GENS'
    OPEN BUS 249936
    END
    CONTINGENCY 'P11:0.342:DEI:08CR_SOL_GEN:1'
    REMOVE MACHINE 1 FROM BUS 251904
    END

Answer 1

我之前看到过这个问题 - 但在我创建代码之前它就被关闭了。

我不使用 pandas，我不计算块，但我逐行读取并将它们添加到单独的列表中，然后检查列表是否有 5 行或更多行，以及最后一行是否有

END

。

other_list = []
index = 0
 
with open("combined.con") as infile:
    for line in infile:
        other_list.append(line)
        if len(other_list) >= 5 and line.startwith('END'):
            with open(f"Contingency_{i}.con", 'w') as outfile:
                for item in other_list:
                    outfile.write(item)
            other_list = []
            index += 1
                   
# make sure there is no data
if len(other_list) > 0:
    with open(f"Contingency_{i}.con", 'w') as outfile:
        for item in other_list:
            outfile.write(item)

将大文本文件分成每个文件约 5 行的较小文本文件，并在行末尾找到“END”参数

问题描述投票：0回答：1

1个回答

最新问题

将大文本文件分成每个文件约 5 行的较小文本文件，并在行末尾找到“END”参数

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1