使用curl解析XML返回null

Question

我正在尝试解析RSS提要，我得到的是一个看似空的DOM Document对象。我目前的代码是：

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

    $curl = curl_init();
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt( $curl, CURLOPT_URL, $xml_url );

    $xml = curl_exec( $curl );
    curl_close( $curl );

    //$xml = iconv('UTF-8', 'UTF-8//IGNORE', $xml);
    //$xml = utf8_encode($xml);
    $document = new DOMDocument;
    $document->loadXML( $xml ); 
    if( ini_get('allow_url_fopen') ) {
      echo "allow url fopen? Yes";
    }
    echo "<br />";
    var_dump($document);

    $items = $document->getElementsByTagName("item");

    foreach ($items as $item) {
        $title = $item->getElementsByTagName('title');
        echo $title;
    }

    $url = 'https://thehockeywriters.com/category/san-jose-sharks/feed/';
    $xml = simplexml_load_file($url);
    foreach ($items as $item) {
        $title = $item->title;
        echo $title;
    }
    print_r($xml);
    echo "<br />";
    var_dump($xml);
    echo "<br />hello?";

此代码是基于堆栈溢出中的以下示例中给出的答案和建议解析相同URL的两个单独尝试： Example 1 Example 2

我尝试过的东西： 1.检查以确保允许allow_url_fopen 2.确保有UTF编码 3.验证XML 4.以前链接的Stack Overflow帖子提供的代码示例

这是我目前与var_dumps和echo's的输出

allow url fopen? Yes
object(DOMDocument)#2 (34) { ["doctype"]=> NULL ["implementation"]=> string(22) "(object value omitted)" 
["documentElement"]=> NULL ["actualEncoding"]=> NULL ["encoding"]=> NULL 
["xmlEncoding"]=> NULL ["standalone"]=> bool(true) ["xmlStandalone"]=> bool(true) 
["version"]=> string(3) "1.0" ["xmlVersion"]=> string(3) "1.0" 
["strictErrorChecking"]=> bool(true) ["documentURI"]=> NULL ["config"]=> NULL 
["formatOutput"]=> bool(false) ["validateOnParse"]=> bool(false) ["resolveExternals"]=> bool(false) 
["preserveWhiteSpace"]=> bool(true) ["recover"]=> bool(false) ["substituteEntities"]=> bool(false) 
["nodeName"]=> string(9) "#document" ["nodeValue"]=> NULL ["nodeType"]=> int(9) ["parentNode"]=> NULL 
["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> NULL ["lastChild"]=> NULL 
["previousSibling"]=> NULL ["attributes"]=> NULL ["ownerDocument"]=> NULL ["namespaceURI"]=> NULL 
["prefix"]=> string(0) "" ["localName"]=> NULL ["baseURI"]=> NULL ["textContent"]=> string(0) "" } 
bool(false) 
hello?

Answer 1

我对您的代码唯一的问题是没有定义用户代理会给我错误403来访问该代码。

将来，您可以使用curl_getinfo来提取请求的状态代码，以确保它没有失败，并进一步将其与代码200匹配，这意味着可以。

$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);

除了你的循环中的一些错误。

使用SimpleXML：

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$feed = new SimpleXMLElement($data);
// list all titles...
foreach ($feed->channel->item as $item)
{
    echo $item->title, "<br>\n";
}

使用DOMDocument：

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$xml = new DOMDocument();
$xml->loadXML($data);
// list all titles...
foreach ($xml->getElementsByTagName("item") as $item)
{
    foreach ($item->getElementsByTagName("title") as $title)
    {
        echo $title->nodeValue, "<br>\n";
    }
}

如果您只想打印所有项目的标题/说明：

foreach ($feed->channel->item as $item)
{
    echo $item->title;
    echo $item->description;
    // uncomment the below line to print only the first entry.
    // break;
}

如果您只想要第一个条目，而不使用foreach：

echo $feed->channel->item[0]->title;
echo $feed->channel->item[0]->description;

将标题和描述保存到数组中以供以后使用：

$result = [];
foreach ($feed->channel->item as $item)
{
    $result[] = 
    [
        'title' => (string)$item->title,
        'description' => (string)$item->description
    ];
    // could make a key => value alternatively from the above with 
    // title as key like this: 
    // $result[(string)$item->title] = (string)$item->description;
}

Foreach用MySQLi / PDO编写的声明：

foreach ($feed->channel->item as $item)
{
    // MySQLi
    $stmt->bind_param('ss', $item->title, $item->description);
    $stmt->execute();
    // PDO
    //$stmt->bindParam(':title', $item->title, PDO::PARAM_STR);
    //$stmt->bindParam(':description', $item->description, PDO::PARAM_STR);
    //$stmt->execute();
}

Answer 2

我选择了Prix的答案来指出用户代理定义，但我想出了另一种做循环的方法，它避免了嵌套循环，并且更容易访问其他节点。这是我正在使用的（DOM文档解决方案）：

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");

$xml = curl_exec( $curl );
curl_close( $curl );

$document = new DOMDocument;
$document->loadXML( $xml ); 

$items = $document->getElementsByTagName("item");       
foreach ($items as $item) {     
    $title = $item->getElementsByTagName('title')->item(0)->nodeValue;
    echo $title;
    $desc = $item->getElementsByTagName('description')->item(0)->nodeValue;
    echo $desc;
}

使用curl解析XML返回null

问题描述投票：0回答：2

2个回答

最新问题

使用curl解析XML返回null

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2