PHP DOMDocument 错误实体“nbsp”未定义

问题描述 投票:0回答:5

我使用 DOMDocument 来编辑一些 HTML 文件,但有些主题的名称空间中有。所以DOMDocument自动把空格改成%20然后就找不到了。

这就是错误的样子:

Warning: DOMDocument::load() [domdocument.load]: Entity 'nbsp' not defined in file:///C:/Path/To/The/File/01%20c%2040-1964.html, line: 11 in C:/Path/To/class.php on line 51

如何修复此错误?

php dom domdocument
5个回答
15
投票

使用

DOMDocument::loadHTMLFile()
而不是
load()
。这就是它的目的。 HTML 不是 XML。

XML 不知道命名实体

 
。但是,如果您使用 loadHTML,XML 解析器将加载 HTML 命名实体,这样错误就会消失。

另请参阅:XML 解析器错误:实体未定义


6
投票

如果加载 xml - 使用带有标志 ENT_XML1 的 htmlentities()。

$offerXml->addChild('name', htmlentities($name, ENT_XML1));

1
投票

以下是如何解析包含

 
等 HTML 实体的 XHTML 片段。

该示例来自一段代码,该代码解析从 Confluence API 导出的 XHTML,包括像

<ac:structured-macro>
这样的自定义元素。这就是您需要设置名称空间的原因。更改或删除
xmlns
属性以适合您的用例。

$snippet = '<p>hello&nbsp;world</p>';

$entitydefs = XML_HTML_DEFS;

$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
$entitydefs
]>
<root
  xmlns:ac="http://www.atlassian.com/schema/confluence/4/ac/"
  xmlns:ri="http://www.atlassian.com/schema/confluence/4/ri/"
  xmlns="http://www.atlassian.com/schema/confluence/4/">
  $snippet
</root>
XML;

$dom = new DOMDocument();
$dom->loadXML($xml);

哪里

XML_HTML_DEFS
是这个(你可以删除那些你知道不会发生的):

const XML_HTML_DEFS = <<<ENTITIES
<!ENTITY amp "&#38;">
<!ENTITY lt "&#60;">
<!ENTITY gt "&#62;">
<!ENTITY Agrave "&#192;">
<!ENTITY Aacute "&#193;">
<!ENTITY Acirc "&#194;">
<!ENTITY Atilde "&#195;">
<!ENTITY Auml "&#196;">
<!ENTITY Aring "&#197;">
<!ENTITY AElig "&#198;">
<!ENTITY Ccedil "&#199;">
<!ENTITY Egrave "&#200;">
<!ENTITY Eacute "&#201;">
<!ENTITY Ecirc "&#202;">
<!ENTITY Euml "&#203;">
<!ENTITY Igrave "&#204;">
<!ENTITY Iacute "&#205;">
<!ENTITY Icirc "&#206;">
<!ENTITY Iuml "&#207;">
<!ENTITY ETH "&#208;">
<!ENTITY Ntilde "&#209;">
<!ENTITY Ograve "&#210;">
<!ENTITY Oacute "&#211;">
<!ENTITY Ocirc "&#212;">
<!ENTITY Otilde "&#213;">
<!ENTITY Ouml "&#214;">
<!ENTITY Oslash "&#216;">
<!ENTITY Ugrave "&#217;">
<!ENTITY Uacute "&#218;">
<!ENTITY Ucirc "&#219;">
<!ENTITY Uuml "&#220;">
<!ENTITY Yacute "&#221;">
<!ENTITY THORN "&#222;">
<!ENTITY szlig "&#223;">
<!ENTITY agrave "&#224;">
<!ENTITY aacute "&#225;">
<!ENTITY acirc "&#226;">
<!ENTITY atilde "&#227;">
<!ENTITY auml "&#228;">
<!ENTITY aring "&#229;">
<!ENTITY aelig "&#230;">
<!ENTITY ccedil "&#231;">
<!ENTITY egrave "&#232;">
<!ENTITY eacute "&#233;">
<!ENTITY ecirc "&#234;">
<!ENTITY euml "&#235;">
<!ENTITY igrave "&#236;">
<!ENTITY iacute "&#237;">
<!ENTITY icirc "&#238;">
<!ENTITY iuml "&#239;">
<!ENTITY eth "&#240;">
<!ENTITY ntilde "&#241;">
<!ENTITY ograve "&#242;">
<!ENTITY oacute "&#243;">
<!ENTITY ocirc "&#244;">
<!ENTITY otilde "&#245;">
<!ENTITY ouml "&#246;">
<!ENTITY oslash "&#248;">
<!ENTITY ugrave "&#249;">
<!ENTITY uacute "&#250;">
<!ENTITY ucirc "&#251;">
<!ENTITY uuml "&#252;">
<!ENTITY yacute "&#253;">
<!ENTITY thorn "&#254;">
<!ENTITY yuml "&#255;">
<!ENTITY nbsp "&#160;">
<!ENTITY iexcl "&#161;">
<!ENTITY cent "&#162;">
<!ENTITY pound "&#163;">
<!ENTITY curren "&#164;">
<!ENTITY yen "&#165;">
<!ENTITY brvbar "&#166;">
<!ENTITY sect "&#167;">
<!ENTITY uml "&#168;">
<!ENTITY copy "&#169;">
<!ENTITY ordf "&#170;">
<!ENTITY laquo "&#171;">
<!ENTITY not "&#172;">
<!ENTITY shy "&#173;">
<!ENTITY reg "&#174;">
<!ENTITY macr "&#175;">
<!ENTITY deg "&#176;">
<!ENTITY plusmn "&#177;">
<!ENTITY sup2 "&#178;">
<!ENTITY sup3 "&#179;">
<!ENTITY acute "&#180;">
<!ENTITY micro "&#181;">
<!ENTITY para "&#182;">
<!ENTITY cedil "&#184;">
<!ENTITY sup1 "&#185;">
<!ENTITY ordm "&#186;">
<!ENTITY raquo "&#187;">
<!ENTITY frac14 "&#188;">
<!ENTITY frac12 "&#189;">
<!ENTITY frac34 "&#190;">
<!ENTITY iquest "&#191;">
<!ENTITY times "&#215;">
<!ENTITY divide "&#247;">
<!ENTITY forall "&#8704;">
<!ENTITY part "&#8706;">
<!ENTITY exist "&#8707;">
<!ENTITY empty "&#8709;">
<!ENTITY nabla "&#8711;">
<!ENTITY isin "&#8712;">
<!ENTITY notin "&#8713;">
<!ENTITY ni "&#8715;">
<!ENTITY prod "&#8719;">
<!ENTITY sum "&#8721;">
<!ENTITY minus "&#8722;">
<!ENTITY lowast "&#8727;">
<!ENTITY radic "&#8730;">
<!ENTITY prop "&#8733;">
<!ENTITY infin "&#8734;">
<!ENTITY ang "&#8736;">
<!ENTITY and "&#8743;">
<!ENTITY or "&#8744;">
<!ENTITY cap "&#8745;">
<!ENTITY cup "&#8746;">
<!ENTITY int "&#8747;">
<!ENTITY there4 "&#8756;">
<!ENTITY sim "&#8764;">
<!ENTITY cong "&#8773;">
<!ENTITY asymp "&#8776;">
<!ENTITY ne "&#8800;">
<!ENTITY equiv "&#8801;">
<!ENTITY le "&#8804;">
<!ENTITY ge "&#8805;">
<!ENTITY sub "&#8834;">
<!ENTITY sup "&#8835;">
<!ENTITY nsub "&#8836;">
<!ENTITY sube "&#8838;">
<!ENTITY supe "&#8839;">
<!ENTITY oplus "&#8853;">
<!ENTITY otimes "&#8855;">
<!ENTITY perp "&#8869;">
<!ENTITY sdot "&#8901;">
<!ENTITY Alpha "&#913;">
<!ENTITY Beta "&#914;">
<!ENTITY Gamma "&#915;">
<!ENTITY Delta "&#916;">
<!ENTITY Epsilon "&#917;">
<!ENTITY Zeta "&#918;">
<!ENTITY Eta "&#919;">
<!ENTITY Theta "&#920;">
<!ENTITY Iota "&#921;">
<!ENTITY Kappa "&#922;">
<!ENTITY Lambda "&#923;">
<!ENTITY Mu "&#924;">
<!ENTITY Nu "&#925;">
<!ENTITY Xi "&#926;">
<!ENTITY Omicron "&#927;">
<!ENTITY Pi "&#928;">
<!ENTITY Rho "&#929;">
<!ENTITY Sigma "&#931;">
<!ENTITY Tau "&#932;">
<!ENTITY Upsilon "&#933;">
<!ENTITY Phi "&#934;">
<!ENTITY Chi "&#935;">
<!ENTITY Psi "&#936;">
<!ENTITY Omega "&#937;">
<!ENTITY alpha "&#945;">
<!ENTITY beta "&#946;">
<!ENTITY gamma "&#947;">
<!ENTITY delta "&#948;">
<!ENTITY epsilon "&#949;">
<!ENTITY zeta "&#950;">
<!ENTITY eta "&#951;">
<!ENTITY theta "&#952;">
<!ENTITY iota "&#953;">
<!ENTITY kappa "&#954;">
<!ENTITY lambda "&#955;">
<!ENTITY mu "&#956;">
<!ENTITY nu "&#957;">
<!ENTITY xi "&#958;">
<!ENTITY omicron "&#959;">
<!ENTITY pi "&#960;">
<!ENTITY rho "&#961;">
<!ENTITY sigmaf "&#962;">
<!ENTITY sigma "&#963;">
<!ENTITY tau "&#964;">
<!ENTITY upsilon "&#965;">
<!ENTITY phi "&#966;">
<!ENTITY chi "&#967;">
<!ENTITY psi "&#968;">
<!ENTITY omega "&#969;">
<!ENTITY thetasym "&#977;">
<!ENTITY upsih "&#978;">
<!ENTITY piv "&#982;">
<!ENTITY OElig "&#338;">
<!ENTITY oelig "&#339;">
<!ENTITY Scaron "&#352;">
<!ENTITY scaron "&#353;">
<!ENTITY Yuml "&#376;">
<!ENTITY fnof "&#402;">
<!ENTITY circ "&#710;">
<!ENTITY tilde "&#732;">
<!ENTITY ensp "&#8194;">
<!ENTITY emsp "&#8195;">
<!ENTITY thinsp "&#8201;">
<!ENTITY zwnj "&#8204;">
<!ENTITY zwj "&#8205;">
<!ENTITY lrm "&#8206;">
<!ENTITY rlm "&#8207;">
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
<!ENTITY lsquo "&#8216;">
<!ENTITY rsquo "&#8217;">
<!ENTITY sbquo "&#8218;">
<!ENTITY ldquo "&#8220;">
<!ENTITY rdquo "&#8221;">
<!ENTITY bdquo "&#8222;">
<!ENTITY dagger "&#8224;">
<!ENTITY Dagger "&#8225;">
<!ENTITY bull "&#8226;">
<!ENTITY hellip "&#8230;">
<!ENTITY permil "&#8240;">
<!ENTITY prime "&#8242;">
<!ENTITY Prime "&#8243;">
<!ENTITY lsaquo "&#8249;">
<!ENTITY rsaquo "&#8250;">
<!ENTITY oline "&#8254;">
<!ENTITY euro "&#8364;">
<!ENTITY trade "&#8482;">
<!ENTITY larr "&#8592;">
<!ENTITY uarr "&#8593;">
<!ENTITY rarr "&#8594;">
<!ENTITY darr "&#8595;">
<!ENTITY harr "&#8596;">
<!ENTITY crarr "&#8629;">
<!ENTITY lceil "&#8968;">
<!ENTITY rceil "&#8969;">
<!ENTITY lfloor "&#8970;">
<!ENTITY rfloor "&#8971;">
<!ENTITY loz "&#9674;">
<!ENTITY spades "&#9824;">
<!ENTITY clubs "&#9827;">
<!ENTITY hearts "&#9829;">
<!ENTITY diams "&#9830;">
ENTITIES;

0
投票

如果您对传递给 DOMDocument 的文档有一定的控制权,则另一种选择是使用 HTML 实体 (

&#160;
) 或 unicode 字符 (
U+00A0
) 而不是命名字符 (
&nbsp;
)。

假设有以下文件: 文档1(test_1.xml)

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>this line has a space ( )</element>
    <element>this line has a non breakin space (&nbsp;)</element>
    <element>this line has an integral symbol (&int;)</element>
</root>

文档2(test_2.xml)

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>this line has a space (&#32;)</element>
    <element>this line has a non breakin space (&#160;)</element>
    <element>this line has an integral symbol (&#8747;)</element>
</root>

文档3(test_3.xml)

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <element>this line has a space ( )</element>
    <element>this line has a non breakin space (U+00A0)</element>
    <element>this line has an integral symbol (U+222B)</element>
</root>

还有一个加载这些文档的 php 文件:(index.php)

<?php
    declare(strict_types=1);
    $xmlDoc = new DOMDocument();
    $xml_file = file_get_contents( "test_2.xml" );
    $xmlDoc->loadXML( $xml_file );
?>

第一个文档将生成如下警告:

Warning: DOMDocument::loadXML(): Entity 'nbsp' not defined in Entity, line: 4 in /path/to/file/index.php on line 5

Warning: DOMDocument::loadXML(): Entity 'int' not defined in Entity, line: 5 in /path/to/file/index.php on line 5

但是文档 2 和 3 会被很好地解析。

参考:


-2
投票
$textHTML = '&lt;ul&gt; &lt;li&gt;Dentro da ordem jur&amp;iacute;dica
brasileira.&lt;/li&gt; &lt;/ul&gt;

像这样保存在 XML 文件中:

htmlspecialchars($textHTML, ENT_QUOTES);

像这样恢复文件:

$doc->load(file.xml);
© www.soinside.com 2019 - 2024. All rights reserved.