我正在尝试检索两个长度相同的向量,一个向量具有孩子的属性,第二个具有相应父母的属性。示例文件:
countries.xml <- "<country>
<city id='1'>
<place id='1.1'> xxx </place>
<place id='1.2'> xxx </place>
<place id='1.3'> xxx </place>
</city>
<city id='2'>
<place id='2.1'> xxx </place>
<place id='2.2'> xxx </place>
<place id='2.3'> xxx </place>
</city>
</country>"
到目前为止,我的代码
library("XML")
doc = xmlTreeParse(countries.xml, useInternalNodes = T)
xpathSApply(doc, path = "//city/place/@id")
xpathSApply(doc, path = "//city/place/parent::*/@id")
我希望最终得到这样的向量(命名)
"1.1" "1.2" "1.3" "2.1" "2.2" "2.3"
"1" "1" "1" "2" "2" "2"
但是第二条路径产生了
"1" "2"
我通过循环获得了想要的东西
library(glue)
place_id <- unname(xpathSApply(doc, path = "//city/place/@id"))
city_id <- vector()
for(i in place_id){
city_id <- c(city_id,unname(xpathSApply(doc, path = glue("//city/place[@id={i}]/parent::*/@id"))))
}
city_id
"1" "1" "1" "2" "2" "2"
但它效率很低,并且要处理我正在处理的大型xml.file。我敢肯定,有一种方法可以在xpathSApply
中用正确的路径获得所需的东西,但找不到它,所以请有人启发我:)?
具有tidyverse
和xml2
的解决方案
require(xml2)
require(tidyverse)
cntry <- read_xml(countries.xml)
pmap_df(list(
xml_children(cntry) %>% map(xml_attr,'id') %>%
map(~as_tibble(.) %>% select(country = value)),
xml_children(cntry) %>% map(xml_children) %>%
map(xml_attr,'id') %>%
map(~as_tibble(.) %>% select(place = value))
),cbind)