为什么`Array.from`在字符串上的工作方式与`.split("")`不同?

问题描述 投票:0回答:2

这是我的输入字符串:

str = `👍🤌🤔😊😎😂😘👌😒😍❤️🤣`;
// str.length = 24
// str.split("") = [ "\ud83d", "\udc4d", /* 22 more elements */ ],

所以,当我打电话给

Array.from(str)
时,我预计内部会发生这样的情况:

arr = Array.from({
  length: 24,
  0: "\ud83d", 1: "\udc4d" /* ... and so on */
})

并且

arr
应该与
str.split("")
相同:

["\ud83d", "\udc4d", /* 22 more elements */ ]

但是

arr
的价值是这样的:

// arr.length = 13
[
  "👍",  "🤌",  "🤔", "😊",  "😎",  "😂",
  "😘",  "👌",  "😒",  "😍",  "❤",  "️",
  "🤣"
]

作为参考,这等于我们调用

str.match(/[\s\S]/)gu
得到的结果。为什么?

const str = `👍🤌🤔😊😎😂😘👌😒😍❤️🤣`
const arr = Array.from(str)
console.log(arr)

javascript arrays string unicode
2个回答
1
投票

我找到了答案。基本上,

Array.from
将检查传递的对象是否有迭代器。如果是这样,迭代器将优先于
length
和数字命名的属性。由于
String
有一个迭代器并且它支持 Unicode,因此
Array.from
将正确提取字符。示例:

a1 = {
  length: 3,
  0: "a",
  1: "b",
  2: "c",
  3: "d",
  4: "e",
  5: "f"
};
a2 = {
  length: -5,
  0: "a",
  1: "b",
  2: "c",
  *[Symbol.iterator]() {
    yield "p";
    yield "q";
    yield "r";
    yield "s";
    yield "t";
  }
};
b = Array.from(a1); // ["a", "b", "c"]; the number of elements selected is equal to the length property
console.log(b);
c = Array.from(a2);
console.log(c); // ["p", "q", "r", "s", "t"]; length and numerically named properties are ignored


0
投票

来源:UTF-16 字符、Unicode 代码点和字素簇

split("") 将按 UTF-16 代码单元分割并分隔代理项 对。字符串索引也指每个UTF-16编码的索引 单元。另一方面,Symbol.iterator 通过 Unicode 代码进行迭代 点。迭代字素簇将需要一些自定义 代码。

"😄".split(""); // ['\ud83d', '\ude04']; splits into two lone surrogates

// "Backhand Index Pointing Right: Dark Skin Tone"
[..."👉🏿"]; // ['👉', '🏿']
// splits into the basic "Backhand Index Pointing Right" emoji and
// the "Dark skin tone" emoji

// "Family: Man, Boy"
[..."👨‍👦"]; // [ '👨', '‍', '👦' ]
// splits into the "Man" and "Boy" emoji, joined by a ZWJ

// The United Nations flag
[..."🇺🇳"]; // [ '🇺', '🇳' ]
// splits into two "region indicator" letters "U" and "N".
// All flag emojis are formed by joining two region indicator letters
© www.soinside.com 2019 - 2024. All rights reserved.