从字符串中获取最后一个字符，可能是也可能不是unicode

Question

我正在解析一个包含 alpha 字符串和包含 IPA 发音的 unicode/UTF-8 字符串的文件。

我希望能够获取字符串的最后一个字符，但有时这些字符占用两个空格，例如

    syl = 'tyl'  # plain ascii
    last_char = syl[-1]
    # last char is 'l'

    syl = 'tl̩'  # contains IPA char
    last_char = syl[-1]
    # last char erroneously contains: '̩' which is a diacritical mark on the l
    # want the whole character 'l̩'

如果我尝试使用 .decode() 它会失败

'str' object has no attribute 'decode'

如果我尝试使用 .encode().decode()，我就回到了开始的地方，我只得到变音标记而不是完整的字符。

如何获取unicode/utf-8字符串的最后一个字符（当你不知道它是ascii还是unicode字符串时）

我想我可以使用查找表来查找已知字符，如果失败，请返回并获取 syl[-2:]。有没有更简单的方法？

Answer 1

您可能正在寻找（不知道）的是最后的字符串grapheme。

您可以这样做，例如使用

grapheme

包：

import grapheme

syl = 'tl̩'
*_, last = grapheme.graphemes(syl)
assert last == 'l̩'
assert len(last) == 2

从字符串中获取最后一个字符，可能是也可能不是unicode

问题描述投票：0回答：1

1个回答

最新问题

从字符串中获取最后一个字符，可能是也可能不是unicode

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1