domdocument 相关问题

DOMDocument是指封装DOM（文档对象模型）的类。各种语言和技术对此PHP，COM，C ++，ActiveX使用名称DOMDocument

我有以下 HTML 代码：我正在尝试提取图像 URL 并重新... 我有以下HTML代码： <div class="article__img has-caption" style="background-image: url('img/main/article-plug.png')" > 我正在尝试提取图像 URL 并将其替换为另一个（例如 img/main/article-plug.webp），但我被 XPath 查询困住了，不知道该怎么做。提前感谢您的帮助！那是我最后的代码（但它还没有返回任何东西）： $domDocument = new DOMDocument(); $domDocument->loadHTML($article["DESCRIPTION"]); $domXPath = new DOMXPath($domDocument); $img = $domXPath->query('substring-before(substring-after(//div[@class=\'article__img has-caption\']/@style, "background-image: url(\'"), "\')")'); 使用 DOM 解析器从给定的 HTML 代码中提取图像 URL： $dom = new DOMDocument(); $dom->loadHTML($html); $divs = $dom->getElementsByTagName('div'); foreach ($divs as $div) { if ($div->getAttribute('class') === 'article__img has-caption') { $style = $div->getAttribute('style'); preg_match('/url$(.*?)$/', $style, $matches); $imageUrl = $matches[1]; $newImageUrl = 'img/main/article-plug.webp'; $newStyle = str_replace($imageUrl, $newImageUrl, $style); $div->setAttribute('style', $newStyle); } } $newHtml = $dom->saveHTML(); echo $newHtml; // output: <div class="article__img has-caption" style="background-image: url('img/main/article-plug.webp')" > 此代码首先将 HTML 加载到 DOM 对象中，找到类为“article__img has-caption”的 div，从其样式属性中提取图像 URL，将其替换为新的图像 URL，更新 div 的样式属性，最后用更新后的图片 URL 生成新的 HTML。

php xpath domdocument

回答 1 投票 0

为什么 DOMDocument 将两个 html 引用实体转换为实际引用？

我已经在这里待了半天了，所以现在是时候寻求帮助了。我希望 DOMDocument 保留现有实体和 utf-8 字符。我现在认为这是不可能的...

php domdocument

回答 2 投票 0

DOMDocument 以一种奇怪的方式转换实体

我已经在这里待了半天了，所以现在是时候寻求帮助了。我确定我错过了一些简单的东西。我希望 DOMDocument 保留现有实体和 utf-8 字符。 ...

php domdocument

回答 0 投票 0

如何在 php 中使用 xpath 获取跨度文本

HTML：雪铁龙 ...

php xpath domdocument

回答 1 投票 0

Php DomDocument saveXML 更改标签闭包标准

我必须解析 XML 文件，这些文件有一些像这样和其他像这样关闭的标签。我需要保持标签关闭与原始文件相同，那里...

php xml domdocument

回答 0 投票 0

如何在 php 中查看 DOMNodeList 对象的数据

当我想测试 php 数组时，我使用以下代码 print_r($myarray); 但知道我想查看对象的数据我的目标是 $xpath = new DOMXPath($doc); $myobject = $xpath->

php dom xpath domdocument

回答 10 投票 0

XPath：获取标题后的第一段

我想向我的站点添加一个 FAQPage 架构。为此，我需要找到每个带有问号的或标签。这就是问题所在。之后我需要第一个 ... 我想添加一个 FAQPage 架构到我的网站。为此，我需要找到每个带有问号的 <h2> 或 <h3> 标签。这就是问题所在。之后我需要标题后的第一个<p>标签作为答案。最终结果应该是这样的： { "@type": "Question", "name": "How long does it take to process a refund?", "acceptedAnswer": { "@type": "Answer", "text": "CONTENT FROM FIRST P-TAG", "url": "https://www.example.com/answer#anchor_link" } } 问题的"name"是<h2>或<h3>标签。答案的"url"是永久链接和来自<h2>或<h3>标签的锚链接。这两个参数就解决了不幸的是，我无法弄清楚如何在标题标签之后获取第一个段落标签。我需要下一行第一段的内容： "text": "CONTENT FROM FIRST P-TAG", 到目前为止，这是我当前的代码： <?php $content_postid = get_the_ID(); $content_post = get_post($content_postid); $content = $content_post->post_content; $content = apply_filters('the_content', $content); $content = str_replace(']]>', ']]>', $content); libxml_use_internal_errors(true); $dom = new DOMDocument; $dom->loadHTML('<?xml encoding="utf-8" ?>' . $content); $xp = new DOMXPath($dom); $query = "//h2[contains(., '?')] | //h3[contains(., '?')]"; $nodes = $xp->query($query); $stack = []; if ($nodes) { $faq_count = count($nodes); $faq_i = 1; echo ' <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": ['; foreach($nodes as $node) { echo '{ "@type": "Question", "name": "'.$node->nodeValue.'", "acceptedAnswer": { "@type": "Answer", "text": "CONTENT FROM FIRST P-TAG", "url": "'.get_permalink().'#'.$node->getAttribute('id').'" } }'; if ($faq_i != $faq_count) : echo ','; endif; $faq_i++; } echo ']}</script>'; } ?> 如您所见，我正在使用这一行来查找每个带有<h2>的<h3>或?标签： $query = "//h2[contains(., '?')] | //h3[contains(., '?')]"; 我想我需要第二个$query才能找到标题后的parapgrah？但是如何检查标题后的第一个标签？我试过这个额外的查询： $query2 = "//h2[contains(., '?')]/following-sibling::p[1] | //h3[contains(., '?')]/following-sibling::p[1]"; 但是following-sibling::和following::都不适合我。它总是显示最后一个标题之后的段落。我需要解决第一个查询吗？想知道我是什么水平？这是一个$content_post的例子（它总是不同的）： <h2>Lorem ipsum dolor sit amet?</h2> <p>consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim</p> <p>veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.</p> <h3>Duis autem vel eum?</h3> <p>iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.</p> <h2>Nam liber tempor cum soluta?</h2> <h3>nobis eleifend option congue nihil</h3> <p>imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p> <p>Et wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.</p> <h3>Duis autem vel?</h3> <p>eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.</p> <h4>Nam liber tempor cum soluta nobis</h4> <p>eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum.</p>

php wordpress dom xpath domdocument

回答 0 投票 0

WordPress：在 <p> 标签后找到第一个 <h2> 并添加提取的前 150 个字符

我想向我的站点添加一个 FAQPage 架构。为此，我需要找到每个标签和它后面的第一个标签。最终结果应如下所示： { "@type": " 我想在我的网站上添加一个 FAQPage 架构。为此，我需要找到每个 <h2> 标签和它后面的第一个 <p> 标签。最终结果应该是这样的： { "@type": "Question", "name": "How long does it take to process a refund?", "acceptedAnswer": { "@type": "Answer", "text": "The first 150 chars from the first <p> tag", "url": "https://www.example.com/answer#anchor_link" } } 问题的"name"是<h2>标签。答案的"url"是永久链接和<h2>标签的锚链接。这两个参数就解决了不幸的是，我无法弄清楚如何在<p>标签之后获得第一个<h2>标签。到目前为止，这是我当前的代码： <?php $content_postid = get_the_ID(); $content_post = get_post($content_postid); $content = $content_post->post_content; $content = apply_filters('the_content', $content); $content = str_replace(']]>', ']]>', $content); libxml_use_internal_errors(true); $dom = new DOMDocument; $dom->loadHTML('<?xml encoding="utf-8" ?>' . $content); $xp = new DOMXPath($dom); $query = '//*[contains("h2", name())]'; $nodes = $xp->query($query); $currentLevel = ['level' => 0, 'count' => 0]; $stack = []; if ($nodes) { $faq_count = count($nodes); $faq_i = 1; echo ' <script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": ['; foreach($nodes as $node) { $level = (int)$node->tagName[1]; while($level < $currentLevel['level']) { $currentLevel = array_pop($stack); } if ($level === $currentLevel['level']) { $currentLevel['count']++; } else { $stack[] = $currentLevel; $currentLevel = ['level' => $level, 'count' => 1]; } echo '{ "@type": "Question", "name": "'.$node->nodeValue.'", "acceptedAnswer": { "@type": "Answer", "text": "TEST <a href='.get_permalink().'#'.$node->getAttribute('id').'>more</a>", "url": "'.get_permalink().'#'.$node->getAttribute('id').'" } }'; if ($faq_i != $faq_count) : echo ','; endif; $faq_i++; } echo ']}</script>'; } ?> 如你所见，我正在使用这条线来查找每一个<h2>： $query = '//*[contains("h2", name())]'; 我想我需要第二个$query来找到<p>标签。但是我如何检查<h2>之后的第一个标签？如何将前 150 个字符添加到此行而不是TEST： "text": "TEST <a href='.get_permalink().'#'.$node->getAttribute('id').'>more</a>", 我在这里找到了一个answer，它与childNodes一起使用。也许这可能是一个解决方案？这个答案也很有帮助。我试着做这样的事情： foreach($nodes as $n) { $p_text = $xp->query('following::p', $n)->item(0)->nodeValue; // only the first 150 chars $p_text_out = mb_strlen($p_text) > 150 ? mb_substr($p_text,0,150)."..." : $p_text; } following::p有点……但它总是第一个<p>标签。

php wordpress dom xpath domdocument

回答 0 投票 0

DOMDocument::loadHTML()：警告 - htmlParseEntityRef：实体中没有名称

我发现了几个类似的问题，但到目前为止，没有一个能够帮助我。我试图在一个 HTML 块中输出所有图像的“src”，所以我使用的是 DOMDocument()。这种方法是...

php warnings domdocument

回答 9 投票 0

使用 DomDocument 改变 <a> 数以千计的帖子

我使用 PHP DomDocument 类从 3000 多个帖子中提取所有 a 标签，并将它们收集在数据库中，如下所示 - 我使用 domDocument C14N() 函数来填充 existing_link 表。编号 |

php laravel dom domdocument

回答 0 投票 0

获取PHP DOMDocumentFragment的nodeValue？

有谁能告诉我如何获得任何PHP DOMDocumentFragment的nodeValue？或者是否有可能把这个对象转换成节点，这样我就可以得到值了？有什么建议吗？谅谅