如何提取xml文件注释中包含的数值?

问题描述 投票:0回答:1

我想提取.xml(svg)文件的注释中包含的2个数字列表:第二个列表在xml文件中(此处为r1 [[2]] [[6]])具有可变位置,因此我想为注释文本找到一些索引:

这是我当前的结果,好的,但是手工找到了索引。

r1 [[2]] [[2]]

r1 [[2]] [[6]]

此代码有效,但我需要手动找到r1 [[1]] [[6]]索引6。

library(XML)

filepath1 <- paste("MyPath\\", sep="")
filename.cbv <- paste(filepath1,"CBV_Simpler.svg",sep="")

xml.cbv <- xmlTreeParse(filename.cbv,asText = FALSE)

r1 <- xmlRoot(xml.cbv) ##get the top node of the document
xmlSize(r1) # 2 nodes only, our data are under node 2

r1[[2]][[2]]
r1[[2]][[6]]


leftHemishpval1 <- xmlValue(r1[[2]][[2]]) 
righttHemishpval1 <- xmlValue(r1[[2]][[6]]) 

我的SVG示例文件在这里:

<svg xmlns:svg="http://www.w3.org/2000/svg"
     xmlns="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     version="1.1"
     width="1070"
     height="1200"
     zoomAndPan="magnify">
             <text x="520" y="20" text-anchor="middle" font-size="18"  font-family="Arial">Test Image</text>
    <g transform="translate(20, 50) scale(1, 1)" font-family="Arial">
        <g transform="translate(0, 250) scale(1, -1)">
            <polygon points="0,0 1000,0 1000,200 0,200" style="stroke-width:1;stroke:#000000;fill:#ffffff" />
        </g>
<!--
Left Hemisphere
0.025, 0
0.0753676, 0
0.125735, 0
0.176103, 0
0.226471, 0
0.276838, 883
0.327206, 1833
0.377574, 4319
0.427941, 4905
0.478309, 7499
0.528676, 10246
0.579044, 18289
0.629412, 21230
0.679779, 20371
0.730147, 27955
-->
        <g transform="translate(0, 250) scale(1, -1)">
             <path d="M 0,0 L 0,0 L 7.35294,0 L 14.7059,0 L 22.0588,0 L 29.4118,0 L 36.7647,3.01731 L 44.1176,6.26356 L 51.4706,14.7585 L 58.8235,16.7609 L 66.1765,25.6249 L 73.5294,35.0117 L 80.8824,62.4955 L 88.2353,72.5452 L 95.5882,69.6099 L 102.941,95.5253 L 110.294,133.366 L 117.647,126.888 L 125,103.197 L 132.353,127.034 L 139.706,122.104 L 147.059,135.892 L 154.412,88.7526 L 161.765,112.184 L 169.118,140.166 L 176.471,99.0039 L 183.824,150.537 L 191.176,101.854 L 198.529,127.971 L 205.882,159.019 L 213.235,108.667 L 220.588,170.521 L 227.941,149.529 L 235.294,123.204 L 242.647,190.203 L 250,130.824 L 257.353,164.756 L 264.706,199.484 L 272.059,131.73 L 279.412,200 L 286.765,134.648 L 294.118,165.077 L 301.471,196.497 L 308.824,160.283 L 316.176,125.548 L 323.529,150.923 L 330.882,146.403 L 338.235,168.73 L 345.588,136.373 L 352.941,105.305 L 360.294,126.888 L 367.647,122.674 L 375,141.082 L 382.353,113.076 L 389.706,86.798 L 397.059,102.961 L 404.412,97.0117 L 411.765,107.417 L 419.118,84.8161 L 426.471,64.1938 L 433.824,75.624 L 441.176,72.2753 L 448.529,80.9616 L 455.882,63.3532 L 463.235,59.0306 L 470.588,44.5249 L 477.941,51.7248 L 485.294,47.1834 L 492.647,51.5744 L 500,39.2455 L 507.353,28.8609 L 514.706,32.9888 L 522.059,30.4943 L 529.412,32.9409 L 536.765,23.9334 L 544.118,18.6437 L 551.471,21.808 L 558.823,19.1119 L 566.176,22.0472 L 573.529,16.7575 L 580.882,12.4007 L 588.235,17.4853 L 595.588,11.0851 L 602.941,12.3768 L 610.294,13.4156 L 617.647,8.2079 L 625,10.5281 L 632.353,5.6929 L 639.706,6.49934 L 647.059,6.64628 L 654.412,3.70756 L 661.765,4.7737 L 669.118,2.8157 L 676.471,2.95922 L 683.824,3.02072 L 691.176,1.66413 L 698.529,2.16645 L 705.882,1.24383 L 713.235,1.17549 L 720.588,0.994379 L 727.941,0.611663 L 735.294,0.587743 L 742.647,0.399802 L 750,0.454476 L 757.353,0.351962 L 764.706,0.232363 L 772.059,0.375882 L 779.412,0.174273 L 786.765,0.17769 L 794.118,0.256283 L 801.471,0.126433 L 808.823,0.12985 L 816.176,0.0888448 L 823.529,0.109347 L 830.882,0.0854277 L 838.235,0.061508 L 845.588,0.061508 L 852.941,0.0341711 L 860.294,0.030754 L 867.647,0.0239198 L 875,0.00683422 L 882.353,0.0375882 L 889.706,0.0205027 L 897.059,0.0136684 L 904.412,0.0136684 L 911.765,0 L 919.118,0 L 926.471,0 L 933.823,0 L 941.176,0 L 948.529,0 L 955.882,0 L 963.235,0 L 970.588,0 L 977.941,0 L 985.294,0 L 992.647,0 L 1000,0 L 1000,0 " style="stroke-width:4;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 37.8805,100 L 37.8805,205" style="stroke-width:1;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 161.33,100 L 161.33,205" style="stroke-width:1;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 284.781,100 L 284.781,205" style="stroke-width:3;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 408.23,100 L 408.23,205" style="stroke-width:1;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 531.681,100 L 531.681,205" style="stroke-width:1;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 655.131,100 L 655.131,205" style="stroke-width:1;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 778.581,100 L 778.581,205" style="stroke-width:1;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 902.031,100 L 902.031,205" style="stroke-width:1;stroke:#ff0000;fill:#ff0000;fill-opacity:0.4" />
             <path d="M 277.372,100 L 277.372,205" style="stroke-width:3;stroke:#000000;fill:#000000;fill-opacity:0.4" />
             <path d="M 100,-20 L 100,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 300,-20 L 300,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 500,-20 L 500,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 600,-20 L 600,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 900,-20 L 900,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
        </g>
             <text x="500" y="20" text-anchor="middle" font-size="14">Left (median = 1.234, &#956; = 1.2345, &#963; = +/-0.1234)</text>
             <text x="1000" y="290" text-anchor="middle" font-size="12">1.2345</text>
<!--
Right Hemisphere
0.025, 0
0.0753676, 50
0.125735, 2781
0.176103, 8161
0.226471, 9881
0.276838, 20905
0.327206, 18107
0.377574, 31681
0.427941, 22355
0.478309, 29350
0.528676, 29104
0.579044, 35388
0.629412, 31030
0.679779, 26076
0.730147, 36480
-->
        <g transform="translate(0, 250) scale(1, -1)">
             <path d="M 0,0 L 0,0 L 7.35294,0.170855 L 14.7059,9.50298 L 22.0588,27.887 L 29.4118,33.7645 L 36.7647,71.4347 L 44.1176,61.8736 L 51.4706,108.257 L 58.8235,76.3895 L 66.1765,100.292 L 73.5294,99.4516 L 80.8824,120.925 L 88.2353,106.033 L 95.5882,89.1045 L 102.941,124.656 L 110.294,163.686 L 117.647,144.937 L 125,113.069 L 132.353,132.813 L 139.706,132.864 L 147.059,159.948 L 154.412,103.186 L 161.765,128.805 L 169.118,151.043 L 176.471,97.8626 L 183.824,143.932 L 191.176,92.7062 L 198.529,114.682 L 205.882,137.669 L 213.235,93.9773 L 220.588,145.453 L 227.941,123.303 L 235.294,98.5734 L 242.647,150.377 L 250,100.586 L 257.353,128.111 L 264.706,155.222 L 272.059,105.397 L 279.412,157.993 L 286.765,105.165 L 294.118,128.473 L 301.471,150.363 L 308.824,123.057 L 316.176,95.0913 L 323.529,117.788 L 330.882,116.503 L 338.235,135.499 L 345.588,110.017 L 352.941,84.7853 L 360.294,104.833 L 367.647,100.497 L 375,115.215 L 382.353,94.3088 L 389.706,73.3414 L 397.059,89.2651 L 404.412,85.0484 L 411.765,98.2829 L 419.118,77.7597 L 426.471,58.7059 L 433.824,71.0109 L 441.176,65.4513 L 448.529,75.8427 L 455.882,58.6786 L 463.235,54.8583 L 470.588,41.5521 L 477.941,49.4114 L 485.294,45.0614 L 492.647,50.7612 L 500,39.8572 L 507.353,29.7289 L 514.706,35.0049 L 522.059,32.8111 L 529.412,36.3102 L 536.765,27.5624 L 544.118,20.6735 L 551.471,24.0257 L 558.823,22.1395 L 566.176,24.5724 L 573.529,18.8795 L 580.882,14.0785 L 588.235,19.8363 L 595.588,12.2606 L 602.941,14.427 L 610.294,15.9032 L 617.647,9.56791 L 625,13.4702 L 632.353,8.33091 L 639.706,9.39705 L 647.059,10.2308 L 654.412,6.3114 L 661.765,8.89132 L 669.118,5.31019 L 676.471,6.46859 L 683.824,7.20668 L 691.176,4.24405 L 698.529,6.19522 L 705.882,3.84083 L 713.235,4.23722 L 720.588,4.74978 L 727.941,2.87721 L 735.294,3.834 L 742.647,2.24504 L 750,2.26554 L 757.353,2.26213 L 764.706,1.14131 L 772.059,1.66413 L 779.412,0.796187 L 786.765,0.844026 L 794.118,0.796187 L 801.471,0.444224 L 808.823,0.56724 L 816.176,0.30754 L 823.529,0.41347 L 830.882,0.392968 L 838.235,0.235781 L 845.588,0.242615 L 852.941,0.174273 L 860.294,0.181107 L 867.647,0.17769 L 875,0.119599 L 882.353,0.0990962 L 889.706,0.0273369 L 897.059,0.0341711 L 904.412,0.0341711 L 911.765,0.0239198 L 919.118,0.0410053 L 926.471,0.0136684 L 933.823,0.0102513 L 941.176,0.0102513 L 948.529,0.0136684 L 955.882,0.00683422 L 963.235,0 L 970.588,0 L 977.941,0 L 985.294,0 L 992.647,0 L 1000,0 L 1000,0 " style="stroke-width:4;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 119.889,-5 L 119.889,100" style="stroke-width:1;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 265.02,-5 L 265.02,100" style="stroke-width:3;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 410.151,-5 L 410.151,100" style="stroke-width:1;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 555.282,-5 L 555.282,100" style="stroke-width:1;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 700.413,-5 L 700.413,100" style="stroke-width:1;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 845.544,-5 L 845.544,100" style="stroke-width:1;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 990.674,-5 L 990.674,100" style="stroke-width:1;stroke:#0000ff;fill:#0000ff;fill-opacity:0.4" />
             <path d="M 255.474,-5 L 255.474,100" style="stroke-width:3;stroke:#000000;fill:#000000;fill-opacity:0.4" />
             <path d="M 100,-20 L 100,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 200,-20 L 200,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 300,-20 L 300,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 400,-20 L 400,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 500,-20 L 500,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 600,-20 L 600,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 700,-20 L 700,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 800,-20 L 800,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
             <path d="M 900,-20 L 900,10" style="stroke-width:1;stroke:#777777;fill:#777777;fill-opacity:0.0" />
        </g>
             <text x="1000" y="290" text-anchor="middle" font-size="12">1.2345</text>
    </g>
</svg>
r xml-parsing
1个回答
0
投票

使用xml2库,读取文件

xml <- read_xml(filename.cbv)

'xpath'(规范的this部分提供了XPath语法的简要概述,可用于许多基本处理步骤),可以以多种方式指定注释节点;对于上面您对所有注释都感兴趣的文档,路径为//comment(),可以通过

找到节点
comments <- xml_find_all(xml, "//comment()")

这些可以通过以下方式强制转换为字符向量

value <- as.character(comments)

尽管仍然需要解析这些内容以在R中使用,例如,与

read.csv(textConnection(value[[1]]), skip=2, header = FALSE, nrow = 15)

或者在解析之前,通过修剪掉注释的前两行(tail(txt, -2))和最后一行(head(., -1)),可以更加健壮(但仍取决于注释的内部结构的知识)>

txt <- textConnection(value[[1]])
read.csv(text = head(tail(txt, -2), -1), header = FALSE)

我认为使用XML包进行解析的步骤是

xml <- xmlParse(filename.cbv)
value <- xpathApply(xml, "//comment()", as, "character")
© www.soinside.com 2019 - 2024. All rights reserved.