使用elemettree以XML格式获取特定标签的内容

问题描述 投票:0回答:1

以下是我的XML数据:

<PubmedArticle>
<MedlineCitation Status="MEDLINE" Owner="NLM">
  <PMID Version="1">1883738</PMID>
  <DateCompleted>
    <Year>1991</Year>
    <Month>10</Month>
    <Day>07</Day>
  </DateCompleted>
  <DateRevised>
    <Year>2013</Year>
    <Month>11</Month>
    <Day>21</Day>
  </DateRevised>
  <Article PubModel="Print">
    <Journal>
      <ISSN IssnType="Print">0959-9673</ISSN>
      <JournalIssue CitedMedium="Print">
        <Volume>72</Volume>
        <Issue>4</Issue>
        <PubDate>
          <Year>1991</Year>
          <Month>Aug</Month>
        </PubDate>
      </JournalIssue>
      <Title>International journal of experimental pathology</Title>
      <ISOAbbreviation>Int J Exp Pathol</ISOAbbreviation>
    </Journal>
    <ArticleTitle>The effect of HeNe laser radiation on the thyroid gland of the rat.</ArticleTitle>
    <Pagination>
      <MedlinePgn>379-85</MedlinePgn>
    </Pagination>
    <Abstract>
      <AbstractText>Although laser irradiation is becoming common practice in medicine, there is not always a clear understanding of the possible side-effects. The present report is a light and electron microscopic study of the effects of fixed low intensity doses of soft HeNe laser on the thyroid of Wistar rats. The immediate effects are mild multifocal degenerative changes; these lesions recover in less than 3 months. Long-term lesions are identified only by electron microscopy; they consist of an increased number of peroxisomes and free or intramitochondrial crystalline structures. We discuss the laser's hypothetical functions.</AbstractText>
    </Abstract>
    <AuthorList CompleteYN="Y">
      <Author ValidYN="Y">
        <LastName>Lerma</LastName>
        <ForeName>E</ForeName>
        <Initials>E</Initials>
        <AffiliationInfo>
          <Affiliation>Department of Pathology and Radiology, Hospital Universitario Virgen Macarena, University of Seville, Spain.</Affiliation>
        </AffiliationInfo>
      </Author>
      <Author ValidYN="Y">
        <LastName>Hevia</LastName>
        <ForeName>A</ForeName>
        <Initials>A</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Rodrigo</LastName>
        <ForeName>P</ForeName>
        <Initials>P</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Gonzalez-Campora</LastName>
        <ForeName>R</ForeName>
        <Initials>R</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Armas</LastName>
        <ForeName>J R</ForeName>
        <Initials>JR</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Galera</LastName>
        <ForeName>H</ForeName>
        <Initials>H</Initials>
      </Author>
    </AuthorList>
    <Language>eng</Language>
    <PublicationTypeList>
      <PublicationType UI="D016428">Journal Article</PublicationType>
    </PublicationTypeList>
  </Article>
  <MedlineJournalInfo>
    <Country>England</Country>
    <MedlineTA>Int J Exp Pathol</MedlineTA>
    <NlmUniqueID>9014042</NlmUniqueID>
    <ISSNLinking>0959-9673</ISSNLinking>
  </MedlineJournalInfo>
  <ChemicalList>
    <Chemical>
      <RegistryNumber>06LU7C9H1V</RegistryNumber>
      <NameOfSubstance UI="D014284">Triiodothyronine</NameOfSubstance>
    </Chemical>
    <Chemical>
      <RegistryNumber>Q51BO43MG4</RegistryNumber>
      <NameOfSubstance UI="D013974">Thyroxine</NameOfSubstance>
    </Chemical>
  </ChemicalList>
  <CitationSubset>IM</CitationSubset>
  <CommentsCorrectionsList>
    <CommentsCorrections RefType="Cites">
      <RefSource>J Histochem Cytochem. 1969 Oct;17(10):675-80</RefSource>
      <PMID Version="1">4194356</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Acta Anat (Basel). 1986;125(1):10-3</RefSource>
      <PMID Version="1">3953239</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Anat Anz. 1977;142(3):209-12</RefSource>
      <PMID Version="1">603070</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>J Cell Biol. 1964 Nov;23:383-5</RefSource>
      <PMID Version="1">14222822</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>J Cell Biol. 1967 Jun;33(3):605-23</RefSource>
      <PMID Version="1">6036524</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Am J Med. 1983 May;74(5):852-62</RefSource>
      <PMID Version="1">6837608</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Exp Eye Res. 1977 Jan;24(1):45-56</RefSource>
      <PMID Version="1">402283</PMID>
    </CommentsCorrections>
  </CommentsCorrectionsList>
  <MeshHeadingList>
    <MeshHeading>
      <DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D007834" MajorTopicYN="N">Lasers</DescriptorName>
      <QualifierName UI="Q000009" MajorTopicYN="Y">adverse effects</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D008830" MajorTopicYN="N">Microbodies</DescriptorName>
      <QualifierName UI="Q000528" MajorTopicYN="N">radiation effects</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D008854" MajorTopicYN="N">Microscopy, Electron</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D051381" MajorTopicYN="N">Rats</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D011919" MajorTopicYN="N">Rats, Inbred Strains</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D013961" MajorTopicYN="N">Thyroid Gland</DescriptorName>
      <QualifierName UI="Q000528" MajorTopicYN="Y">radiation effects</QualifierName>
      <QualifierName UI="Q000648" MajorTopicYN="N">ultrastructure</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D013974" MajorTopicYN="N">Thyroxine</DescriptorName>
      <QualifierName UI="Q000097" MajorTopicYN="N">blood</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D014284" MajorTopicYN="N">Triiodothyronine</DescriptorName>
      <QualifierName UI="Q000097" MajorTopicYN="N">blood</QualifierName>
    </MeshHeading>
  </MeshHeadingList>
  <OtherID Source="NLM">PMC2001961</OtherID>
</MedlineCitation>
<PubmedData>

我需要从文档中提取所有作者LastName。但是,有多个这样的文件,每个文件都有不同的作者名称。如何解析此文件并仅将作者LastName提取到列表中以创建数据库?

我使用了elementtree来解析文档。以下是我的代码:

tree = ET.parse("file path"+file)
            doc = tree.getroot()
            for LastName in doc.iter('LastName'):
                file1 = (ET.tostring(LastName, encoding='utf8').decode('utf8'))
                file2 = file1[48:(len(file1))]
                author_name_lastname = file2.split("<")[0]
                print(author_name_lastname)

在这里,我只能打印第一个作者名称“Lerma”。

python python-3.x xml-parsing elementtree
1个回答
0
投票
import os
from lxml import etree as ET

DIR="D:\yourfilesdirectory/"

for filename in os.listdir(DIR):
    if filename.endswith(".xml"):
        with open(file=DIR+filename,mode='r',encoding='utf-8') as file:
            _tree = ET.fromstring(text=file.read())
            _all_metadata_tags = _tree.xpath('.//LastName')
            for i in _all_metadata_tags:
                print(i.text + '\n')

    else:
        print("skipping for filename")
© www.soinside.com 2019 - 2024. All rights reserved.