我见过很多关于 SED 中转义和替换特殊字符的主题,但没有一个对我有帮助。
我需要在文件上使用这个 sed 命令:
sed -i "s/This[^\|]\+/& (cool) /g" "file.txt"
出于我不明白的原因,它适用于这个测试用例:
This is my funny 🎺 char and this | char is the char after which i want to stop my job.
...并将其转换为:
This is my funny 🎠(cool) ڠchar and this | char is the char after which i want to stop my job.
...而不是:
This is my funny 🎺 char and this (cool) | char is the char after which i want to stop my job.
有人可以告诉我如何处理这种情况吗?
注意:该文件是 UTF-8 编码的,我使用 UTF-8 编码的 Cygwin,我的 SED 命令也位于 UTF-8 编码的“.sh”文件中。
该错误似乎是由于我在 CYGWIN 上使用 SED 造成的,因为它在 GNU Linux 上运行良好。
感谢您的关注。 我希望这个帖子可以帮助另一个 Cygwin 用户。
我可以使用
sed
(v4.8) 在 Cygwin (v3.3.4) 上确认此问题。虽然我不太确定这里发生了什么,但我发现通过设置 UTF-16 兼容的 system locale 可以让它工作:
$ python3 -c "print('This is my funny {} char and this | char is the char after which i want to stop my job.'.format(chr(0x1f3ba)))" | \
sed 's/This[^|]*/&(cool) /g'
This is my funny (cool) char and this | char is the char after which i want to stop my job.
$ python3 -c "print('This is my funny {} char and this | char is the char after which i want to stop my job.'.format(chr(0x1f3ba)))" | \
LANG=C.UTF16 sed 's/This[^|]*/&(cool) /g'
This is my funny 🎺 char and this (cool) | char is the char after which i want to stop my job.
顺便说一句,在使用
sed
时始终首选单引号,以防 $
和 \
被扩展。
我陷入了类似的情况,但不涉及文件,所以它不是文件编码。
< 7:36a> <10%> C:\>echo gOlIaTh |:u8 sed -e 's/goliath/🦇GOLIATH🦇/gi'
/cygdrive/c/cygwin/bin/sed: -e expression #1, char 1: unknown command: `''
< 7:37a> <15%> C:\>echo gOlIaTh |:u8 sed -e 's/goliath/GOLIATH/gi'
GOLIATH