[使用xpathSApply检索XML中每个子级的父级属性

问题描述 投票:0回答:1

我正在尝试检索两个长度相同的向量,一个向量具有孩子的属性,第二个具有相应父母的属性。示例文件:

countries.xml <- "<country>
              <city id='1'>
                <place id='1.1'> xxx </place>
                <place id='1.2'> xxx </place>
                <place id='1.3'> xxx </place>
              </city>
              <city id='2'>
                <place id='2.1'> xxx </place>
                <place id='2.2'> xxx </place>
                <place id='2.3'> xxx </place>
              </city>
           </country>"

到目前为止,我的代码

library("XML")
doc = xmlTreeParse(countries.xml, useInternalNodes = T)
xpathSApply(doc, path = "//city/place/@id")
xpathSApply(doc, path = "//city/place/parent::*/@id")

我希望最终得到这样的向量(命名)

"1.1" "1.2" "1.3" "2.1" "2.2" "2.3"
"1" "1" "1" "2" "2" "2"

但是第二条路径产生了

"1" "2" 

我通过循环获得了想要的东西

library(glue)
place_id <- unname(xpathSApply(doc, path = "//city/place/@id"))
city_id <- vector()
for(i in place_id){
  city_id <- c(city_id,unname(xpathSApply(doc, path = glue("//city/place[@id={i}]/parent::*/@id"))))
}
city_id
"1" "1" "1" "2" "2" "2"

但它效率很低,并且要处理我正在处理的大型xml.file。我敢肯定,有一种方法可以在xpathSApply中用正确的路径获得所需的东西,但找不到它,所以请有人启发我:)?

r xml-parsing xpathsapply
1个回答
0
投票

具有tidyversexml2的解决方案

require(xml2)
require(tidyverse)

cntry <- read_xml(countries.xml)


pmap_df(list(
  xml_children(cntry) %>% map(xml_attr,'id') %>% 
    map(~as_tibble(.) %>% select(country = value)),
  xml_children(cntry) %>% map(xml_children) %>% 
    map(xml_attr,'id') %>% 
    map(~as_tibble(.) %>% select(place = value))
  ),cbind)
© www.soinside.com 2019 - 2024. All rights reserved.