我正在尝试使用使用Xpath查询将文本映射到结构化表的程序从许多XHTML文档中提取文本。 XHTML文档看起来像这样
<td class="td-3 c12" valign="top">
<p class="pa-4">
<span class="ca-5">text I would like to select </span>
</p>
</td>
<td class="td-3 c13" valign="top">
<p class="pa-2">
<span class="ca-0">some more text I want to select </span>
</p>
<p class="pa-2">
<span class="ca-0">
<br>
</br>
</span>
</p>
<p class="pa-2">
<span class="ca-5">text and values I don't want to select.</span>
</p>
<p class="pa-2">
<span class="ca-5"> also text and values I don't want to </span>
</p>
</td>
我能够按其类选择跨度并检索文本/值,但是它们不够独特,因此我需要按表类进行过滤。例如,仅<< span class ca-0中的文本是td class td-3 c13
的子级原为<span class="ca-0">some more text I want to select </span>
我已经尝试了所有这些组合
//xhtml:td[@class="td-3 c13"]/xhtml:span[@class = "ca-0"]
//xhtml:span[@class = "ca-0"] //ancestor::xhtml:td[@class= "td-3 c13"]
//xhtml:td[@class="td-3 c6"]//xhtml:span[@class = "ca-0"]