这里是一个简单的答案,但是它没有使用JSoup库。您可以尝试使用其他解析器,更新器等(仅在您愿意的情况下)
我有以下HTML:
<html>
<body>
...
<h2> Blah Blah 1</h2>
<p>blah blah</p>
<div>
<div>
<table>
<tbody>
<tr><th>Col 1 Header</th><th>Col 2 Header</th></tr>
<tr><td>Line 1.1 Value</td><td>Line 2.1 Header</td></tr>
<tr><td>Line 2.1 Value</td><td>Line 2.2 Value</td></tr>
</tbody>
</table>
</div>
</div>
<div>
<div>
<table>
<tbody>
<tr><th>Col 1 Header T2</th><th>Col 2 Header T2</th></tr>
<tr><td>Line 1.1 Value T2</td><td>Line 2.1 Header T2</td></tr>
<tr><td>Line 2.1 Value T2</td><td>Line 2.2 Value T2</td></tr>
</tbody>
</table>
</div>
</div>
<h2> Blah Blah 2</h2>
<div>
<div>
<table>
<tbody>
<tr><th>XCol 1 Header</th><th>XCol 2 Header</th></tr>
<tr><td>XLine 1.1 Value</td><td>XLine 2.1 Header</td></tr>
<tr><td>XLine 2.1 Value</td><td>XLine 2.2 Value</td></tr>
</tbody>
</table>
</div>
</div>
<p>blah blah</p>
<div>
<div>
<table>
<tbody>
<tr><th>XCol 1 Header T2</th><th>XCol 2 Header T2</th></tr>
<tr><td>XLine 1.1 Value T2</td><td>XLine 2.1 Header T2</td></tr>
<tr><td>XLine 2.1 Value T2</td><td>XLine 2.2 Value T2</td></tr>
</tbody>
</table>
</div>
</div>
</body>
</html>
我想在包含给定文本的h2标记之后提取第二个DIV。
您可能会在第一和第二个div中注意到p标签不在同一位置。
要在第一个h2之后提取DIV,以下公式将起作用:
h2:contains(Blah 1) + p + div +div
但是要提取第二个,将“ Blah 1”替换为“ Blah 2”将不起作用,因为“” p“”标签位于其他位置,因此静态选择器将是:
h2:contains(Blah 2) + div + p +div
我需要的是一个选择器公式,无论p块位于何处,更改文本都可以使它起作用。>
我尝试了几种方法:例如...选择器nth-of-type
也不起作用,因为我知道DIV的位置仅与h2不是DIV的父级,而是前面的同级...] >请帮助
我有以下HTML:
...blah blah
import Torello.HTML.*;
import Torello.HTML.NodeSearch.*;
import Torello.Java.FileRW;
import java.util.*;
public class SO_01_31
{
private static final String tokenToFind = "Blah Blah 2";
public static void main(String[] argv) throws java.io.IOException
{
String html = FileRW.loadFileToString("example.html");
// Load the HTML provided in the HTML-Text above in your SO Question.
Vector<HTMLNode> page = HTMLPage.getPageTokens(html, false);
// Parse the HTML into an "HTML Vector" Each node will be an HTML Tag, or a TextNode (no comment-nodes here)
Vector<SubSection> h2List = TagNodePeekInclusive.all(page, "h2");
// Return each <H2>...</H2> found on this page. Return results have the nodes, and the vector-indexes, together,
// in an instance of "SubSection"
int foundPos = -1;
// We are looking for a sub-vector (SubSection) that looks like: <H2> 'tokenToFind' </H2>
HTMLNode n;
// Temp Variable
// Iterate through each of the "<H2>...</H2>" subsections that were returned, above, by TagNodeGetInclusive
for (SubSection s : h2List)
// Make sure that the first node after the opening "<H2>" tag is, indeed, a TextNode
if ((n = s.html.elementAt(1)) instanceof TextNode)
// Make sure that this TextNode (the one between the <H2>...</H2>) has the string "tokenToFind"
if (((TextNode) n).str.contains(tokenToFind))
// Record this position.
{ foundPos = s.location.start + 1; break; }
// Exit if this appropriate-match was not found.
if (foundPos == -1) { System.out.println("The specified H2 Title String-Token wasn't found on your page... Exiting."); System.exit(0); }
// Return the second opening <DIV> ... </DIV> subsection that was found.
Vector<HTMLNode> divToFind = TagNodeGetInclusive.nth(page, 2, foundPos, -1, "div");
// Print it out
System.out.println(Util.pageToString(divToFind));
}
}
以上带有文档的代码将以下输出输出到UNIX终端:
@cloudshell:~$ java SO_01_31
<div>
<table>
<tbody>
<tr><th>XCol 1 Header</th><th>XCol 2 Header</th></tr>
<tr><td>XLine 1.1 Value</td><td>XLine 2.1 Header</td></tr>
<tr><td>XLine 2.1 Value</td><td>XLine 2.2 Value</td></tr>
</tbody>
</table>
</div>
这里是一个简单的答案,但是它没有使用JSoup库。您可以尝试使用其他解析器,更新器等(仅在您愿意的情况下)