查找具有特定样式声明的 <video> 标签，然后返回其子 <source> 标签的 src 值

Question

我目前正在使用 PHP 的curl 请求从 URL 获取内容。获取内容后，我需要检查给定的 HTML 块，找到具有给定样式属性的“视频”并提取其源 src 值文本。目前我得到了该页面，但我如何才能获得这个值？这是我获取页面的代码：

<?php
$Url = 'some site';

if (!function_exists('curl_init')){
    die('CURL is not installed!');
}
$ch = curl_init($Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // add this one, it seems to spawn redirect 301 header
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); // spoof
$output = curl_exec($ch);
curl_close($ch);

echo $output;

上面的代码正在运行并输出页面。然后在页面的输出中我检查元素，发现了这个：

<div class="webstarvideo">
  <video style="width:100%;height:100%" preload="none" class="">
    <source src="I NEED THIS" type="video/mp4"></video>
  <div class="webstarvideodoul">
    <canvas></canvas>
  </div>
</div>

我需要上面代码中视频的src，我该怎么做？

Answer 1

在 PHP 级别：

您可以将正则表达式与 preg_match 一起使用或使用 PHP DOMDocument 类：

DOM

$doc = new DOMDocument();
$doc->loadHTML($output);
$videoSource = $doc->getElementsByTagName('source');

echo $videoSource->getAttribute('src');

使用正则表达式

$array = array();
preg_match("/source src=\"([^\"]*)\" type=\"video\/mp4\">/i", $output, $array);
echo $array[1];

Answer 2

如果您想以 PHP 变量的形式获取视频的 SRC，则需要通过检查“type”在哪里来从字符串中提取它：

$output = '<div class="webstarvideo">
  <video style="width:100%;height:100%" preload="none" class="">
    <source src="I NEED THIS" type="video/mp4"></video>
  <div class="webstarvideodoul">
    <canvas></canvas>
  </div>
</div>';

$type_position = strpos($output, "type=");
$video_src = substr($output, 110, $type_position - 112);
echo $video_src; // I NEED THIS

上例中的

是 SRC 属性中直到并包括左双引号的字符数，

是额外的两个字符，用于补偿右双引号和

type 之前的空格

.

希望这有帮助！ :)

Answer 3

使用 PHP，您可以使用 Simple HTML DOM Parser 来执行此操作，查询语法如 jQuery。

$Url = 'some site';

if (!function_exists('curl_init')){
    die('CURL is not installed!');
}
$ch = curl_init($Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // add this one, it seems to spawn redirect 301 header
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); // spoof
$output = curl_exec($ch);
curl_close($ch);

$html = str_get_html($output);

$video = $html->find('video', 0);
$videoSrc = $video->src;
var_dump($videoSrc);

Answer 4

使用 XPath 使用自文档查询来隔离所寻找标签的 src 值。

代码：（演示）

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
echo $xpath->evaluate("string(//video[contains(@style, 'height:100%') and contains(@style, 'height:100%')]/source/@src)");
// I NEED THIS

Answer 5

使用

document.querySelector()

指向您的元素。然后使用

src

获取

document.getAttribute()

属性。

var video = document.querySelector('.webstarvideo video source');
console.log(video.getAttribute('src'));

<div class="webstarvideo">
  <video style="width:100%;height:100%" preload="none" class="">
    <source src="I NEED THIS" type="video/mp4"></video>
  <div class="webstarvideodoul">
    <canvas></canvas>
  </div>
</div>

Answer 6

假设

$output

是完整的文本，您可以使用正则表达式...

preg_match_all("/(?<=\<source).*?src=\"([^\"]+)\"/", $output, $all);

print_r($all[1]); // all the links will be in this array

查找具有特定样式声明的 <video> 标签，然后返回其子 <source> 标签的 src 值

问题描述投票：0回答：6

6个回答

最新问题

查找具有特定样式声明的 <video> 标签，然后返回其子 <source> 标签的 src 值

问题描述 投票：0回答：6

6个回答

最新问题

问题描述投票：0回答：6