Google表格:从一个范围中提取和计算唯一的单词频率

问题描述 投票:0回答:2

我有一列,每一行都是一个句子。例如:

COLUMN1

R1: -Do you think they'll come, sir?

R2: -Oh they'll come, they'll come all right.

R3: Here. Stamp those and mail them.

R4: It's ringing.

R5: Would you walk Myron the other way?

从这个范围中,我想提取一个唯一单词的列表(COLUMN2),并计算它们在该范围中出现的频率(COLUMN3)。

技巧是删除逗号,句点等标点符号。

因此上述的理想结果将是:

COLUMN2    COLUMN3

Do          1

you         2

think       1

they'll     3

come        2

sir         1

Oh          1

all         1

right       1

Here        1

Stamp       1

those       1

and         1

mail        1

them        1

It's        1

ringing     1

Would       1

walk        1

Myron       1

the         1

other       1

way         1

我尝试使用SPLIT函数解析每一行,将每个单词分隔成它们自己的单元格,但是我仍然坚持删除标点符号,并建立唯一单词的列表(我知道这将涉及UNIQUE函数)。我猜测的计数还将涉及COUNTUNIQUE函数。

任何指导将不胜感激!

google-sheets google-sheets-formula array-formulas counting google-sheets-query
2个回答
0
投票

您可以尝试类似的东西

=query(ArrayFormula(transpose(split(query(regexreplace(A1:A5, "[^A-Za-z\s/']" ,""),,50000)," "))), "Select Col1, Count(Col1) where Col1 <>'' group by Col1 label Count(Col1)''")

更改范围以适合。

Example


0
投票

尝试:

=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(REGEXREPLACE(
 TEXTJOIN(" ", 1, LOWER(A:A)), "\.|\,|\?", ), " ")), 
 "select Col1,count(Col1) 
  group by Col1 
  order by count(Col1) desc 
  label count(Col1)''", 0))

0

© www.soinside.com 2019 - 2024. All rights reserved.