我有这段代码可以尝试提取页面上的列表:
$websiteURL = "https://waset.org/conferences-in-january-2022-in-tokyo";
$html = file_get_html($websiteURL);
foreach ( $html->find( 'ul') as $ul ) {
foreach($ul->find('li') as $li) {
echo "LI: " . $li . "<br>";
}
}
这符合我的期望(即在页面上显示所有<li>
的每个<ul>
。
但是,如果我将第二个foreach
替换为(因为我只想获取第一个列表):
foreach ( $html->find( 'ul', 1) as $ul ) {
我得到:
“在int上调用成员函数find()”
...这表明find('ul', 1)
没有返回任何内容,但我不知道为什么?
注意:此页面上有两个以上的列表。
有人知道我在做什么错吗?
回答您的问题“我想我的底线问题是如何从网页的第二个开始访问所有<li>
?”使用受支持的现代API:
<?php
$url = "https://waset.org/conferences-in-january-2022-in-tokyo";
libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHtmlFile($url);
$lists = $dom->getElementsByTagName("ul");
$items = $lists[1]->getElementsByTagName("li");
foreach ($items as $item) {
// clean up extra whitespace
$text = preg_replace("/\s+/", " ", trim($item->textContent));
echo "$text\n------\n";
}
输出:
ICA 2022: Aeroponics Conference, Tokyo (Jan 07-08, 2022)
------
ICAA 2022: Agroforestry and Applications Conference, Tokyo (Jan 07-08, 2022)
------
ICAAAA 2022: Applied Aerodynamics, Aeronautics and Astronautics Conference, Tokyo (Jan 07-08, 2022)
------
ICAAAE 2022: Aquatic Animals and Aquaculture Engineering Conference, Tokyo (Jan 07-08, 2022)
------
ICAAC 2022: Advances in Astronomical Computing Conference, Tokyo (Jan 07-08, 2022)
------
...
还值得注意的是,会议名称在<a>
元素中,位置在其中的<span>
中,日期紧随其后。使用它,您可以相当简单地提取数据:
function getNodeText(\DomNode $node): string
{
$return = "";
foreach($node->childNodes as $child) {
if ($child->nodeName === "#text") {
$return .= trim($child->nodeValue);
}
}
return $return;
}
foreach ($items as $item) {
$conference = getNodeText($item->getElementsByTagName("a")[0]);
$location = getNodeText($item->getElementsByTagName("span")[0]);
$date = getNodeText($item);
echo "------\n$conference | $location | $date\n";
}
输出:
------
ICA 2022: Aeroponics Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAA 2022: Agroforestry and Applications Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAAAA 2022: Applied Aerodynamics, Aeronautics and Astronautics Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAAAE 2022: Aquatic Animals and Aquaculture Engineering Conference, | Tokyo | (Jan 07-08, 2022)
------
ICAAC 2022: Advances in Astronomical Computing Conference, | Tokyo | (Jan 07-08, 2022)
...