从网站获取html字符集 - 非UTF-8格式的元标记

Question

我尝试检索封装在中的编码

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-7">

一个HTML网站。

在上面给出的html我想提取“iso-8859-7”部分你知道我怎么做吗？

注意：它可以是任何类型的值。

我需要它，因为有时我需要网站的编码才能检索元标记并正确编码。

注意：我已经通过php Curl或file_get_contents检索了html的内容。

Answer 1

你收到的是字符串吗？如果是这样，您可以使用RegEx来检索它。

$string = '<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-7">'; // your string

$matches = array(); 
preg_match('/charset=[^"]*/', $string, $matches); // retrieve charset and the value
preg_replace('/charset=/', '', $matches[0]); // remove the 'charset='

您将获得值作为字符串。如果你有html文件开始，上一个答案应该有所帮助。

编辑：如果您想了解更多有关ReGex的信息，可以阅读：

http://www.tutorialspoint.com/php/php_regular_expression.htm

我做了什么;我只是请求“charset =”，后面的所有内容都不是引号。 [^“] *。

Answer 2

你可以使用JQuery

如果你只有一个meta，那么你可以这样做

var myValue = $('head meta').get(0).attr("content");

或者如果你有几个

$("head meta").each(function () {
  alert( $(this).attr("content");
});

在PHP中你可以使用

$ tags = get_meta_tags（'http://www.example.com/'）;

 echo $tags['author'];       // name
 echo $tags['keywords'];     // php documentation
 echo $tags['description'];  // a php manual
 echo $tags['geo_position']; // 49.33;-86.59

这是来自http://php.net/manual/en/function.get-meta-tags.php PHP DOC

从网站获取html字符集 - 非UTF-8格式的元标记

问题描述投票：1回答：2

2个回答

最新问题

从网站获取html字符集 - 非UTF-8格式的元标记

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2