从 <p> 标签获取其纯文本及其子 img 标签的 src 和 alt 值

Question

Answer 1

如果字符串以该格式保存，则可以使用 regex 和 preg_match。

/(img).*?alt="(.*?)".*?src="(.*?)"/

<?php
    $reg = '/(img).*?alt="(.*?)".*?src="(.*?)"/';
    $str = '<p><img style="margin: 5px; float: left;" alt="rotary-wheelchairs" src="images/stories/DSC_0693_400x400.jpg" />In a 2 week period, the Rotary Club of Playa, in partnership with the... 145 wheelchairs to disabled children and adults. </p>';
    $matches = [];
    preg_match($reg, $str, $matches);
    $img = $matches[1];
    $alt = $matches[2];
    $src = $matches[3];
    print $img . ' ' . $alt . ' ' . $src;
?>

Answer 2

您可以尝试使用一些 html 解析器来实现此目的。我用过 domDocument :

$html = "Your html string"
$dom = new domDocument; 
$dom->loadHTML($html);
$img = $dom->getElementsByTagName('img')
//getting the src of image
echo $img->attributes->getNamedItem('src')->value . PHP_EOL;
//getting the alt value
echo $img->attributes->getNamedItem('alt')->value . PHP_EOL;
//plain text
echo $dom->textContent

Answer 3

使用 PHP 和 regexp，我会分多个步骤完成。

首先获取img和纯文本：

preg_match('/(<img.*?>)(.*)</i', $line, $m);
list($x, $img, $plain_text) = $m;
// Bug: This assumes the plain text does not include any tags (eg, <B>).

这可以避免担心属性的顺序以及其他可能让它超出

的事情。

然后分别获取每个属性（因为它们是无序且可选的）：

preg_match('/ src=(".*?"|\'.*?\'|.*?)[ >]/i', $img, $m);
$src = $m[1];
// Bug:  If the whitespace is a new-line, this won't work correctly.
// Bug:  It fails to remove the outer quotes, if any.

每个所需的属性也是如此。

（看看像 domDocument 这样的事情对你有多大帮助！）

从 <p> 标签获取其纯文本及其子 img 标签的 src 和 alt 值

问题描述投票：0回答：3

3个回答

最新问题

从 <p> 标签获取其纯文本及其子 img 标签的 src 和 alt 值

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3