Python - 重复的文本块

问题描述 投票:0回答:4

我是Python的新手,但我已经使用Perl一段时间了。在Perl中,为了将文件的搜索限制为特定的文本块,我会写如下所示:

if (/start_line/ ... /end_line/) {
   do something here
}

一旦/start_line/ ... /end_line/正则表达式匹配,条件/start_line/开始变为真,然后在/end_line/正则表达式匹配之后继续变为真。在逐行读取输入的循环中,这将对起始行和结束行之间的所有行执行if-block,包括端点。

我怎样才能在Python中表达相同的条件?

python perl
4个回答
2
投票

你指的是Perl的触发器操作符(..)。基本上,它在遇到第一个条件时将布尔标志设置为true,并在遇到第二个条件(包括起始和结束行)后将其设置为false。以这种方式看待它,实现起来相当简单。

import re

flip = False;
for line in open(filename):
    if not flip and re.match('start-text',line): flip = True
    if flip:
        print(line)
        if re.match('end-text',line): flip = False

2
投票

如果你尝试那样的话怎么办?

start_line = "line 1"
end_line = "line 2"
in_block = False
line_block = []

with open("file.txt") as search:
    for line in search:
        line = line.rstrip()  # remove '\n' at end of line
        if line == start_line:
            in_block = True
        elif line == end_line:
            line_block.append(line)
            in_block = False

       if in_block:
           line_block.append(line)

0
投票

Perl解决方案实现了触发器操作器,它在连续的循环之间保持状态。其他解决方案已经通过更新标志变量实现了这一点。还可以编写一个类,以便先前匹配的状态保留在实例变量中。下面给出一个例子。这是我最接近优雅的Perl语法

#Fileblock.py
import re

class Block_Extract:
    def __init__(self):
        self.state = False
    def test(self, lines, start, end):
        if not self.state:
            self.m1 = re.search(start, lines)
        self.m2 = re.search(end, lines)
        if self.m1 and not self.m2:
            self.state = True
            return self.state
        if self.m2:
            self.state = False
            return True


start = "line3"
end = "line7"
fileblock = Block_Extract()
with open("Block_Test") as fp:
    for lines in fp:
        lines = lines.rstrip()
        if fileblock.test(lines, start, end):
            print lines

$ cat Block_Test 
This is line1
This is line2
This is line3
This is line4
This is line5
This is line6
This is line7
This is line8
This is line9
This is line10
$ python Fileblock.py 
This is line3
This is line4
This is line5
This is line6
This is line7

0
投票

我最初写了一个生成器,它只从第一个匹配块中产生了行(参见编辑历史),但编辑让我重新考虑我的第一个提议,因为预期的行为是在每个匹配的文本块上执行if主体。

我的新提议又是一个生成器,当然,默认情况下会从匹配的文本块中“永久地”生成行,但是,使用可选的关键字参数,也可以处理匹配块的最大数量(count)次。

概念证明如下

def from_beg_to_end(filename, beg, end, count=0):

    '''Yields the lines from `filename` like in `sed -n /beg/,/end/p`

    By default (for `count=0`) if the file contains multiple blocks
    all the blocks are output, for `count` greater than zero the number
    of blocks whose lines are returned is _at most_ `count.

    Example of use:

    for line in from_beg_to_end(filename, 'a', b'):
        ...```

    inside = False
    for line in open(filename):
        if not inside:
            if beg in line: inside = True
        if inside:
            yield line
            if end in line:
                count = count-1
                if count==0: return
                inside = False

使用简单的字符串匹配。应该很容易调整上面的代码来支持正则表达式。

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.