Python 解析（单独的语句和/或块）C# 代码 - 正则表达式或机器状态

Question

我需要解析 C# 代码。只需考虑断线，将语句分开即可。需要忽略注释、多行注释、逐字字符串和多行逐字字符串。

我尝试的... 我将文件读入变量，然后按换行符分割（因为我需要原始行号）...然后我用模式添加行号，然后按字符 ;, {, } 换行字符串并删除不需要的图案（保留第一个）...

   
   with open("./program.cs", "r") as f:
        prg=[]
        for number, line in enumerate(f): 
            prg.append(f"<#<{number}>#>{line}")       
        dotnet_lines=re.split(r'[;\{\}]',"".join(prg))
        for i in range(len(dotnet_lines)):
            dotnet_lines[i] = dotnet_lines[i].replace("\n","")
            dotnet_lines[i] = re.sub(r'(.)(\<#\<[0-9]+\>#\>)',r'\1',dotnet_lines[i])
        # Result....
        for ln in dotnet_lines:
            ocorrencia=ln.find('>#>')+3
            line=ln[ocorrencia:]
            number=re.sub('[<#>]','',ln[:ocorrencia])        
            print(f"Ln Nr: {number}   {line}")

这是一个基本的解决方案，但它没有解决注释或字符串的问题。

使用 pygments 也可以...但我只想分隔句子块...

from pygments.lexers.dotnet import CSharpLexer
from pygments.token import Token

def tokenize_dotnet_file(file_path):
    with open(file_path, 'r') as file:
        code = file.read()
    
    lexer = CSharpLexer()
    tokens = lexer.get_tokens(code)

    for token in tokens:
        token_type = token[0]
        token_value = token[1]
        print(f"Type: {token_type}, Value: {token_value}")

if __name__ == "__main__":
    file_path = "./program.cs"  
    tokenize_dotnet_file(file_path)

这更好，但我需要句子而不是标记。

Answer 1

首先删除所有注释，然后执行您在第一个解决方案中所做的操作怎么样。

with open("./program.cs", "r") as f:
    no_comments = []
    prg=[]
    for number, line in enumerate(f): 
        no_comments.append(line.replace(r'^\s*\/\/',""));
    for number, line in enumerate(no_comments): 
        prg.append(f"<#<{number}>#>{line}")       
    dotnet_lines=re.split(r'[;\{\}]',"".join(prg))
    for i in range(len(dotnet_lines)):
        dotnet_lines[i] = dotnet_lines[i].replace("\n","")
        dotnet_lines[i] = re.sub(r'(.)(\<#\<[0-9]+\>#\>)',r'\1',dotnet_lines[i])
    # Result....
    for ln in dotnet_lines:
        ocorrencia=ln.find('>#>')+3
        line=ln[ocorrencia:]
        number=re.sub('[<#>]','',ln[:ocorrencia])        
        print(f"Ln Nr: {number}   {line}")

未经测试，效率不高，但我包含上面的代码来澄清我的意思。

Python 解析（单独的语句和/或块）C# 代码 - 正则表达式或机器状态

问题描述投票：0回答：1

1个回答

最新问题

Python 解析（单独的语句和/或块）C# 代码 - 正则表达式或机器状态

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1