使用PHP简单DOM无法仅获得页面上的第二个列表

问题描述 投票:0回答:1

我有这段代码可以尝试提取页面上的列表:

$websiteURL = "https://waset.org/conferences-in-january-2022-in-tokyo";
$html = file_get_html($websiteURL);

foreach ( $html->find( 'ul') as $ul ) {
     foreach($ul->find('li') as $li) {
        echo "LI: " . $li . "<br>";
    }
}

这符合我的期望(即在页面上显示所有<li>的每个<ul>

但是,如果我将第二个foreach替换为(因为我只想获取第一个列表):

foreach ( $html->find( 'ul', 1) as $ul ) {

我得到:

“在int上调用成员函数find()”

...这表明find('ul', 1)没有返回任何内容,但我不知道为什么?

注意:此页面上有两个以上的列表。

有人知道我在做什么错吗?

php html dom web-scraping
1个回答
0
投票

回答您的问题“我想我的底线问题是如何从网页的第二个开始访问所有<li>?”使用受支持的现代API:

<?php
$url = "https://waset.org/conferences-in-january-2022-in-tokyo";

libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHtmlFile($url);
$lists = $dom->getElementsByTagName("ul");
$items = $lists[1]->getElementsByTagName("li");
foreach ($items as $item) {
    // clean up extra whitespace
    $text = preg_replace("/\s+/", " ", trim($item->textContent));
    echo "$text\n------\n";
}

输出:

ICA 2022: Aeroponics Conference, Tokyo (Jan 07-08, 2022)
------
ICAA 2022: Agroforestry and Applications Conference, Tokyo (Jan 07-08, 2022)
------
ICAAAA 2022: Applied Aerodynamics, Aeronautics and Astronautics Conference, Tokyo (Jan 07-08, 2022)
------
ICAAAE 2022: Aquatic Animals and Aquaculture Engineering Conference, Tokyo (Jan 07-08, 2022)
------
ICAAC 2022: Advances in Astronomical Computing Conference, Tokyo (Jan 07-08, 2022)
------
...

还值得注意的是,会议名称在<a>元素中,位置在其中的<span>中,日期紧随其后。使用它,您可以相当简单地提取数据:

function getNodeText(\DomNode $node): string
{
    $return = "";
    foreach($node->childNodes as $child) {
        if ($child->nodeName === "#text") {
            $return .= trim($child->nodeValue);
        }
    }
    return $return;
}

foreach ($items as $item) {
    $conference = getNodeText($item->getElementsByTagName("a")[0]);
    $location = getNodeText($item->getElementsByTagName("span")[0]);
    $date = getNodeText($item);
    echo "------\n$conference | $location | $date\n";
}

输出:

------
ICA 2022: Aeroponics Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAA 2022: Agroforestry and Applications Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAAAA 2022: Applied Aerodynamics, Aeronautics and Astronautics Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAAAE 2022: Aquatic Animals and Aquaculture Engineering Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAAC 2022: Advances in Astronomical Computing Conference, | Tokyo | (Jan 07-08, 2022)
...
© www.soinside.com 2019 - 2024. All rights reserved.