如何从 artcile xml 文件中提取图形的内联提及?

问题描述 投票:0回答:1

是否可以从文章的

.nxml
文件中提取内联提及的数字? 我可以使用 pubmed_parser 提取图形和标题,是否有任何库可用于提取图形的内联提及?

例如,图6在文章上下文中被提及3次:

在进一步的实验中,我们测试了纤维蛋白原渗漏是否与周细胞的变化特定相关的假设。

Using WM tissues from a previously described baboon model of cerebral hypoperfusion, we found that pericytes numbers determined using the same methods above were reduced at 14 days after 3 vessels occlusion (3VO) (Figure 6B)
whereas mean vascular density as assessed by COL4 immunoreactivities were not altered over the survival period (Figure 6A)
This was concomitant with peak of fibrinogen reactivity in the WM tissues (Figure 6B)
。本文中提供的所有原始数据都可供审核。

python xml parsing xml-parsing nsxmlparser
1个回答
0
投票

您可以使用文字搜索:

def find_Fig(text):
    txt_list = text.split()
    
    w =[]
    for x in txt_list:
        if x[0] == '(':
            start = x
        if ')' in x:
            end = x
            w.append((start,end))

    all_pattern = []
    for t in w:
        p = str(t[0])+' '+ str(t[1]).strip('.').strip(',')
        if p.startswith('(Fig'):
            all_pattern.append(p[1:-1])
    return all_pattern


#######################
txt ="""
Using WM tissues from a previously described baboon model of cerebral hypoperfusion, we found that pericytes numbers determined using the same methods above were reduced at 14 days after 3 vessels occlusion (3VO) (Figure 6B), whereas mean vascular density as assessed by COL4 immunoreactivities were not altered over the survival period (Figure 6A). This was concomitant with peak of fibrinogen reactivity in the WM tissues (Figure 6B)
"""
f = find_Fig(txt)
print(f) # ['Figure 6B', 'Figure 6A', 'Figure 6B']
© www.soinside.com 2019 - 2024. All rights reserved.