Python - 无法连接多个非ascii字符串

Question

我正在尝试创建一个包含多个带有特殊字符的字符串的新字符串。这不起作用：

# -*- coding: utf-8 -*-
str1 = "I am"
str2 = "español"
str3 = "%s %s %s" % (str1, u'–', str2)
print str3
>> Traceback (most recent call last):
  File "myscript.py", line 5, in <module>
    str3 = "%s %s %s" % (str1, u'–', str2)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

奇怪的是，如果我删除ñ或–字符，它会正确创建字符串：

# -*- coding: utf-8 -*-
str1 = "I am"
str2 = "espaol"
str3 = "%s %s %s" % (str1, u'–', str2)
print str3
>> I am – espaol

要么：

# -*- coding: utf-8 -*-
str1 = "I am"
str2 = "español"
str3 = "%s %s" % (str1, str2)
print str3
>> I am español

怎么了？

Answer 1

您正在混合Unicode字符串和字节字符串。不要那样做。确保所有字符串都是相同的类型。最好是那个unicode。

混合使用str和unicode时，Python会隐式地使用ASCII编解码器对一种或另一种类型进行解码或编码。通过显式编码或解码来避免隐式操作，使一切都成为一种类型。

这就是造成你的UnicodeDecodeError异常的原因;你混合两个str对象（字节串，str1和str3），但只有str1可以解码为ASCII。 str3包含UTF-8数据，因此解码失败。明确创建unicode字符串或解码数据可以使事情有效：

str1 = u"I am"     # Unicode strings
str2 = u"español"  # Unicode strings
str3 = u"%s %s %s" % (str1, u'–', str2)
print str3

要么

str1 = "I am"
str2 = "español"
str3 = u"%s %s %s" % (str1.decode('utf-8'), u'–', str2.decode('utf-8'))
print str3

请注意，我也使用Unicode字符串文字作为格式化字符串！

你真的应该阅读Unicode，编解码器和Python。我强烈推荐以下文章：

Ned Batchelder的Pragmatic Unicode
Joel Spolsky的The Absolute Minimum Every Programmer Must Know About Unicode
Python Unicode HOWTO

Python - 无法连接多个非ascii字符串

问题描述投票：0回答：1

1个回答

最新问题

Python - 无法连接多个非ascii字符串

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1