使用正则表达式处理python RE包中的标签

问题描述 投票:0回答:1

输入文本文件包含:

<html>
<header>
<title>This is a title</title>
</header>
<body>
        <div>This is a div <div>This is a nested div</div></div>
</body>
</html>

并且我想将以下内容输出到另一个文本文件:

<l>
<r>
<e>This is a title</e>
</r>
<y>
        <v>This is a div <v>This is a nested div</v></v>
</y>
</l>

在python中使用Regex,我该怎么做?更新!!!!我已经尝试过这样的<>:

import re
def run():
    with open('input.txt') as f:
        fout  = open('output.txt', 'w')
        count = 0
        for line in f:
            if not line:
                continue
            pat = re.findall('<[a-zA-Z]+>',line)
            for l in pat:
                y = re.sub('<[a-zA-Z]+>', '<{}>'.format(l[-2]), line, count=0, flags=0)
                fout.write(y)
regex python-3.x tags
1个回答
0
投票

我希望现在提供解决方案为时不晚。这是我的代码:

© www.soinside.com 2019 - 2024. All rights reserved.