我正在尝试解析RSS提要,我得到的是一个看似空的DOM Document对象。我目前的代码是:
$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
$xml = curl_exec( $curl );
curl_close( $curl );
//$xml = iconv('UTF-8', 'UTF-8//IGNORE', $xml);
//$xml = utf8_encode($xml);
$document = new DOMDocument;
$document->loadXML( $xml );
if( ini_get('allow_url_fopen') ) {
echo "allow url fopen? Yes";
}
echo "<br />";
var_dump($document);
$items = $document->getElementsByTagName("item");
foreach ($items as $item) {
$title = $item->getElementsByTagName('title');
echo $title;
}
$url = 'https://thehockeywriters.com/category/san-jose-sharks/feed/';
$xml = simplexml_load_file($url);
foreach ($items as $item) {
$title = $item->title;
echo $title;
}
print_r($xml);
echo "<br />";
var_dump($xml);
echo "<br />hello?";
此代码是基于堆栈溢出中的以下示例中给出的答案和建议解析相同URL的两个单独尝试: Example 1 Example 2
我尝试过的东西:
1.检查以确保允许allow_url_fopen
2.确保有UTF编码
3.验证XML
4.以前链接的Stack Overflow帖子提供的代码示例
这是我目前与var_dumps
和echo's
的输出
allow url fopen? Yes
object(DOMDocument)#2 (34) { ["doctype"]=> NULL ["implementation"]=> string(22) "(object value omitted)"
["documentElement"]=> NULL ["actualEncoding"]=> NULL ["encoding"]=> NULL
["xmlEncoding"]=> NULL ["standalone"]=> bool(true) ["xmlStandalone"]=> bool(true)
["version"]=> string(3) "1.0" ["xmlVersion"]=> string(3) "1.0"
["strictErrorChecking"]=> bool(true) ["documentURI"]=> NULL ["config"]=> NULL
["formatOutput"]=> bool(false) ["validateOnParse"]=> bool(false) ["resolveExternals"]=> bool(false)
["preserveWhiteSpace"]=> bool(true) ["recover"]=> bool(false) ["substituteEntities"]=> bool(false)
["nodeName"]=> string(9) "#document" ["nodeValue"]=> NULL ["nodeType"]=> int(9) ["parentNode"]=> NULL
["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> NULL ["lastChild"]=> NULL
["previousSibling"]=> NULL ["attributes"]=> NULL ["ownerDocument"]=> NULL ["namespaceURI"]=> NULL
["prefix"]=> string(0) "" ["localName"]=> NULL ["baseURI"]=> NULL ["textContent"]=> string(0) "" }
bool(false)
hello?
我对您的代码唯一的问题是没有定义用户代理会给我错误403来访问该代码。
将来,您可以使用curl_getinfo
来提取请求的状态代码,以确保它没有失败,并进一步将其与代码200匹配,这意味着可以。
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
除了你的循环中的一些错误。
使用SimpleXML:
<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
if ($httpcode !== 200)
{
echo "Failed to retrieve feed... Error code: $httpcode";
die();
}
$feed = new SimpleXMLElement($data);
// list all titles...
foreach ($feed->channel->item as $item)
{
echo $item->title, "<br>\n";
}
使用DOMDocument:
<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
if ($httpcode !== 200)
{
echo "Failed to retrieve feed... Error code: $httpcode";
die();
}
$xml = new DOMDocument();
$xml->loadXML($data);
// list all titles...
foreach ($xml->getElementsByTagName("item") as $item)
{
foreach ($item->getElementsByTagName("title") as $title)
{
echo $title->nodeValue, "<br>\n";
}
}
如果您只想打印所有项目的标题/说明:
foreach ($feed->channel->item as $item)
{
echo $item->title;
echo $item->description;
// uncomment the below line to print only the first entry.
// break;
}
如果您只想要第一个条目,而不使用foreach:
echo $feed->channel->item[0]->title;
echo $feed->channel->item[0]->description;
将标题和描述保存到数组中以供以后使用:
$result = [];
foreach ($feed->channel->item as $item)
{
$result[] =
[
'title' => (string)$item->title,
'description' => (string)$item->description
];
// could make a key => value alternatively from the above with
// title as key like this:
// $result[(string)$item->title] = (string)$item->description;
}
Foreach用MySQLi / PDO编写的声明:
foreach ($feed->channel->item as $item)
{
// MySQLi
$stmt->bind_param('ss', $item->title, $item->description);
$stmt->execute();
// PDO
//$stmt->bindParam(':title', $item->title, PDO::PARAM_STR);
//$stmt->bindParam(':description', $item->description, PDO::PARAM_STR);
//$stmt->execute();
}
我选择了Prix的答案来指出用户代理定义,但我想出了另一种做循环的方法,它避免了嵌套循环,并且更容易访问其他节点。这是我正在使用的(DOM文档解决方案):
$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";
$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
$xml = curl_exec( $curl );
curl_close( $curl );
$document = new DOMDocument;
$document->loadXML( $xml );
$items = $document->getElementsByTagName("item");
foreach ($items as $item) {
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
echo $title;
$desc = $item->getElementsByTagName('description')->item(0)->nodeValue;
echo $desc;
}