mb_detect_encoding 似乎检测到 UTF8，但解码后的字符串仍然显示奇怪的字符

Question

我正在努力检测将 UTF8 CSV 数据集插入数据库的正确编码。

在我的数据库中，所有文本字段都是使用 utf8mb4_unicode_520_ci 创建的（这就是我的 WordPress 的配置方式，所以我无法真正更改它）。所以我假设它是一种UTF8编码..

对于我正在使用此功能的所有字段。如果没有这个功能，所有插入都会有奇怪的字符。现在所有字段看起来都很好。

$row_data[$key] = mb_convert_encoding($value, 'ISO-8859-1', 'UTF-8');

...除了两个字段。这两个字段收集在同一个 CSV 中，但来自另一个来源（另一个网站），因此我认为对于 CSV 中的某些字段，编码可能不同。

这是一个示例，其中包含不想插入数据库的示例数据。

<?php

$text = "GergÅ‘ RÃ¡cz";

// Détection de l'encodage
$encoding = mb_detect_encoding($text);

echo "encoding detected: " . $encoding;
$utf8_text = mb_convert_encoding($text, 'ISO-8859-1','UTF-8');

echo "\ntext to UTF-8 : " . $utf8_text;

# php ./p.php
encoding detected: UTF-8
text to UTF-8 : Gerg▒? Rácz

就好像它已经是 UTF-8 但实际上并非如此。而且我无法识别它是什么编码。垃圾字符输入，垃圾输出。

有什么想法吗？

非常感谢！！

Answer 1

‘

（U+2018，左单引号）在

ISO-8859-1

中不存在。使用

Windows-1252

代替，如下所示：

<?php

$text = "GergÅ‘ RÃ¡cz";

// Détection de l'encodage
$encoding = mb_detect_encoding($text);

echo "\nencoding detected: " . $encoding;
$utf8_text = mb_convert_encoding($text, 'Windows-1252','UTF-8');

echo "\ntext to UTF-8 : " . $utf8_text;

?>

输出：

php .\SO\78678820.php

encoding detected: UTF-8
text to UTF-8 : Gergő Rácz

mb_detect_encoding 似乎检测到 UTF8，但解码后的字符串仍然显示奇怪的字符

问题描述投票：0回答：1

1个回答

最新问题

mb_detect_encoding 似乎检测到 UTF8，但解码后的字符串仍然显示奇怪的字符

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1