PHP / DOM：解析 HTML 以根据类提取数据

Question

我有这个 HTML

<div class="news">
    <h3 class="border-bottom">Title 2</h3>
    <p class="mt-0 ml-1">2023-04-01</p>
    <img src="20230401.jpg" class="w-50 float-right ml-2">
    <p class="lead"><p>Description 2</p></p>
    <a href="https://.../news/245" class="btn btn-secondary">Read more</a>
</div>
<div class="news">
    <h3 class="border-bottom">Title 1</h3>
    <p class="mt-0 ml-1">2023-03-31</p>
    <img src="20230331.jpg" class="w-50 float-right ml-2">
    <p class="lead"><p>Description 1</p></p>
    <a href="https://.../news/244" class="btn btn-secondary">Read more</a>
</div>

我想提取每个项目的标题和日期。我试过这个

$class = "news";
$dom = new DOMDocument();
$dom->loadHTML($html);
$a = new DOMXPath($dom);
$divs = $a->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $class ')]");

foreach ($divs as $link) {
    print_r($link->nodeValue);
}

但它显示：

标题 2

2023-04-01

说明 2

阅读更多

标题 1

2023-03-31

说明 1

阅读更多

我卡住了，不知道如何提取这个

Answer 1

首先，您的样本无效（它包含嵌套的

<p>

s）。假设你解决了这个问题，我会尝试这样的事情：

$qu = "//div[contains(@class,'{$class}')]";
$divs = $a->query($qu);
foreach ($divs as $div)
{   
    $targets = $a->query('.//h3 | p[1]',$div);
    echo($targets[0]->textContent ." ".$targets[1]->textContent."\r\n");
};

输出，基于你的固定样本 html:

Title 2 2023-04-01
Title 1 2023-03-31

编辑：

同样要获取链接，for 循环应更改为：

{   
    $targets = $a->query('.//h3 | p[1] | a/@href',$div);
    echo($targets[0]->textContent ." ".$targets[1]->textContent." ".$targets[2]->textContent."\r\n");
};

现在的输出应该是：

Title 2 2023-04-01 https://.../news/245
Title 1 2023-03-31 https://.../news/244

PHP / DOM：解析 HTML 以根据类提取数据

问题描述投票：0回答：1

1个回答

最新问题

PHP / DOM：解析 HTML 以根据类提取数据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1