正则表达式和 PHP - 将 src 属性与 img 标签隔离[重复]

Question

使用 PHP，如何将 src 属性的内容与 $foo 隔离？我正在寻找的最终结果会给我“http://example.com/img/image.jpg”

$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';

Answer 1

如果您不想使用正则表达式（或任何非标准 PHP 组件），使用内置 DOMDocument 类的合理解决方案如下：

<?php
    $doc = new DOMDocument();
    $doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
    $imageTags = $doc->getElementsByTagName('img');

    foreach($imageTags as $tag) {
        echo $tag->getAttribute('src');
    }
?>

Answer 2

代码

<?php
    $foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';
    $array = array();
    preg_match( '/src="([^"]*)"/i', $foo, $array ) ;
    print_r( $array[1] ) ;

输出

http://example.com/img/image.jpg

Answer 3

我得到了这个代码：

$dom = new DOMDocument();
$dom->loadHTML($img);
echo $dom->getElementsByTagName('img')->item(0)->getAttribute('src');

假设只有一张img :P

Answer 4

// Create DOM from string
$html = str_get_html('<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />');

// echo the src attribute
echo $html->find('img', 0)->src;

http://simplehtmldom.sourceforge.net/

Answer 5

我对此已经很晚了，但我有一个尚未提及的简单解决方案。使用

simplexml_load_string

加载它（如果您启用了 simplexml），然后通过

json_encode

和

json_decode

翻转它。

$foo = '<img class="foo bar test" title="test image" src="http://example.com/img/image.jpg" alt="test image" width="100" height="100" />';

$parsedFoo = json_decode(json_encode(simplexml_load_string($foo)), true);
var_dump($parsedFoo['@attributes']['src']); // output: "http://example.com/img/image.jpg"

$parsedFoo

出现为

array(1) {
  ["@attributes"]=>
  array(6) {
    ["class"]=>
    string(12) "foo bar test"
    ["title"]=>
    string(10) "test image"
    ["src"]=>
    string(32) "http://example.com/img/image.jpg"
    ["alt"]=>
    string(10) "test image"
    ["width"]=>
    string(3) "100"
    ["height"]=>
    string(3) "100"
  }
}

我已经使用它来解析 XML 和 HTML 几个月了，而且效果非常好。我还没有遇到任何问题，尽管我还没有必要用它来解析一个大文件（我想使用

json_encode

和

json_decode

这样的输入越大，速度就越慢）。它很复杂，但它是迄今为止读取 HTML 属性的最简单方法。

Answer 6

这就是我最终所做的，尽管我不确定这有多有效：

$imgsplit = explode('"',$data);
foreach ($imgsplit as $item) {
    if (strpos($item, 'http') !== FALSE) {
        $image = $item;
        break;
    }
}

Answer 7

您可以使用此功能解决此问题：


函数 getTextBetween($start, $end, $text)
{
 $start_from = strpos($text, $start);
 $start_pos = $start_from + strlen($start);
 $end_pos = strpos($text, $end, $start_pos + 1);
 $subtext = substr($text, $start_pos, $end_pos);
 返回$subtext；
}

$foo = '';

$img_src = getTextBetween('src="', '"', $foo);

Answer 8

<?php
    $html = '
        <img border="0" src="/images/image1.jpg" alt="Image" width="100" height="100" />
        <img border="0" src="/images/image2.jpg" alt="Image" width="100" height="100" />
        <img border="0" src="/images/image3.jpg" alt="Image" width="100" height="100" />
        ';
    
    $get_Img_Src = '/<img[^>]*src=([\'"])(?<src>.+?)\1[^>]*>/i'; //for get img src path only...
    
    preg_match_all($get_Img_Src, $html, $result); 
    if (!empty($result)) {
        echo $result['src'][0];
        echo $result['src'][1];
    }

还可以获取 img src 路径和替代文本 然后使用下面的正则表达式而不是上面的...

]*src=(['"])(?.+?) [^>]alt=(['"])(?.+?) >

    $get_Img_Src = '/<img[^>]*src=([\'"])(?<src>.+?)\1[^>]*alt=([\'"])(?<alt>.+?)\2*>/i'; //for get img src path & alt text also
    
    preg_match_all($get_Img_Src, $html, $result); 
    if (!empty($result)) {
        echo $result['src'][0];
        echo $result['src'][1];
        echo $result['alt'][0];
        echo $result['alt'][1];
    }

我从这里想到了这个很棒的解决方案，PHP从href标签中提取链接

对于提取特定域的 URL，请尝试以下正则表达式

// for e.g. if you need to extract onlt urls of "test.com" 
// then you can do it as like below regex

<a[^>]+href=([\'"])(?<href>(https?:\/\/)?test\.com.* ?)\1[^>]*>

附加信息

要获取包含base64编码数据的img src属性，您可以像下面这样做。你可以在here onlinephp.io
上测试它

<?php
$html = '
    <p>test </p>
    <img border="0" src="/images/image1.jpg" alt="Image" width="100" height="100" />
    <img border="0" src="/images/image2.jpg" alt="Image" width="100" height="100" />
    <img border="0" src="/images/image3.jpg" alt="Image" width="100" height="100" />
    <img border="0" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJUAAAAfCAYAAADuiY/xAAAAGXRF..." alt="Base64 Image 1" width="100" height="100" />
    <img border="0" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAAAAAAAAAAAAAAEAAQAAAQAAAQEAAEAAAD/..." alt="Base64 Image 2" width="100" height="100" />
    <h1>asas</h1>
    <img border="0" src="/images/image2.jpg" alt="Image" width="100" height="100" />
    <img border="0" src="data:image/gif;base64,R0lGODlhPQBEAP8A..." alt="Base64 Image 3" width="100" height="100" />
    <img border="0" src="http://test.com/images/image2.jpg" alt="Image" width="100" height="100" />
';

$get_Img_Src = '/<img[^>]*src=["\'](data:image\/[^;]+;base64[^"\']+)["\'][^>]*>/i'; // Regex to capture base64 image src

preg_match_all($get_Img_Src, $html, $result);

// Debugging step: print the entire result array
echo "Full result:\n";
print_r($result);

if (!empty($result[1])) {
    echo "Base64 matches found: " . count($result[1]) . PHP_EOL;
    // Access the base64 data in the first capture group, i.e. $result[1]
    foreach ($result[1] as $base64) {
        echo $base64 . PHP_EOL;  // Echo each base64 encoded image string
    }
} else {
    echo "No base64 images found." . PHP_EOL;
}
?>

Answer 9

尝试这个模式：

'/< \s* img [^\>]* src \s* = \s* [\""\']? ( [^\""\'\s>]* )/'

Answer 10

我使用 preg_match_all 来捕获 HTML 文档中的所有图像：

preg_match_all("~<img.*src\s*=\s*[\"']([^\"']+)[\"'][^>]*>~i", $body, $matches);

这个允许更宽松的声明语法，带有空格和不同的引用类型。

正则表达式读起来像 （任何属性，如 style 或 border） src （可能的空格）=（可能的空格）（' 或 "）（任何非引号符号）（' 或 "）（任何直到>) (>)

Answer 11

假设我使用

$text ='<img src="blabla.jpg" alt="blabla" />';

在

getTextBetween('src="','"',$text);

代码将返回：

blabla.jpg" alt="blabla"

这是错误的，我们希望代码返回属性值引号之间的文本，即 attr =“value”。

所以

  function getTextBetween($start, $end, $text)
            {
                // explode the start string
                $first_strip= end(explode($start,$text,2));

                // explode the end string
                $final_strip = explode($end,$first_strip)[0];
                return $final_strip;
            }

成功了！

尝试

   getTextBetween('src="','"',$text);

将返回：

blabla.jpg

还是谢谢你，因为你的解决方案让我了解了最终的解决方案。

正则表达式和 PHP - 将 src 属性与 img 标签隔离[重复]

问题描述投票：0回答：11

11个回答

代码

输出

附加信息

最新问题

正则表达式和 PHP - 将 src 属性与 img 标签隔离[重复]

问题描述 投票：0回答：11

11个回答

代码

输出

附加信息

最新问题

问题描述投票：0回答：11