识别列中的特定字符串并按组计算 R 中唯一字母的数量

问题描述 投票:0回答:1

我有一个数据框

df1

df1<- setNames(data.frame(matrix(ncol = 3, nrow = 37)), c("material","condition", "pID")) df1$material <- c("FBZOIKS","FBZOIKS","FBZOIKS","FBZOIKS","VNTYALQ","VNTYALQ","VNTYALQ","HMRCJXU","HMRCJXU","HMRCJXU","HMRCJXU","HMRCJXU","CURHJXM","UXJMRCH","UXJMRCH","XMRCUJH","XMRCUJH","XMRCUJH","FBZOIKS","FBZOIKS", "FBZOIKS","FBZOIKS","VNTYALQ","VNTYALQ","VNTYALQ","VNTYALQ","HMRCJXU","HMRCJXU","HMRCJXU","HMRCJXU","CURHJXM","CURHJXM","UXJMRCH","UXJMRCH","XMRCUJH","XMRCUJH","XMRCUJH") df1$condition <- c("false"," "," "," "," "," "," "," "," "," "," ","","false"," "," "," "," ",""," false"," ", " "," "," "," "," "," "," "," "," "," "," false"," "," "," "," "," ","") df1$pID <- c("p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1"," p1","p1"," p2"," p2", " p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2"," p2","p2")

我需要创建两列,按 pID 将它们分组:

block
Nletters_block
。 对于块,我需要识别
condition
列中的第一个“假”,并给出值 1,直到识别出该 pID 的下一个“假”。当识别出下一个时,我需要分配一个值 2。如果识别出下一个,我需要分配一个值 3,依此类推。

对于 Nletters_block,我需要计算每个参与者和区块中嵌入列中的唯一字母数量

material

如果我可以使用

dplyr
库,我会更好。

以下是我想要获得的:

material  condition pID  block  Nletters_block
FBZOIKS   false     p1   1      21
FBZOIKS             p1   1      21
FBZOIKS             p1   1      21
FBZOIKS             p1   1      21
VNTYALQ             p1   1      21
VNTYALQ             p1   1      21
VNTYALQ             p1   1      21
HMRCJXU             p1   1      21
HMRCJXU             p1   1      21
HMRCJXU             p1   1      21
HMRCJXU             p1   1      21
HMRCJXU             p1   1      21
CURHJXM   false     p1   2      7
UXJMRCH             p1   2      7
UXJMRCH             p1   2      7
XMRCUJH             p1   2      7
XMRCUJH             p1   2      7
XMRCUJH             p1   2      7
FBZOIKS   false     p2   1      21
FBZOIKS             p2   1      21
FBZOIKS             p2   1      21
FBZOIKS             p2   1      21
VNTYALQ             p2   1      21
VNTYALQ             p2   1      21
VNTYALQ             p2   1      21
VNTYALQ             p2   1      21
HMRCJXU             p2   1      21
HMRCJXU             p2   1      21
HMRCJXU             p2   1      21
HMRCJXU             p2   1      21
CURHJXM   false     p2   2      7
CURHJXM             p2   2      7
UXJMRCH             p2   2      7
UXJMRCH             p2   2      7
XMRCUJH             p2   2      7
XMRCUJH             p2   2      7
XMRCUJH             p2   2      7
r dplyr indexing replace unique
1个回答
0
投票

首先按 pID 分组,然后按 pIDblock 分组。通过粘贴所有材料并随后分割来计算字符数。

df1 %>% 
  mutate(block = cumsum(condition == "false"), .by = pID) %>% 
  mutate(Nletters_block = length(unique(unlist(strsplit(
                            paste(material, collapse=""), "")))), .by = c(pID, block))
   material condition pID block Nletters_block
1   FBZOIKS     false  p1     1             21
2   FBZOIKS            p1     1             21
3   FBZOIKS            p1     1             21
4   FBZOIKS            p1     1             21
5   VNTYALQ            p1     1             21
6   VNTYALQ            p1     1             21
7   VNTYALQ            p1     1             21
8   HMRCJXU            p1     1             21
9   HMRCJXU            p1     1             21
10  HMRCJXU            p1     1             21
11  HMRCJXU            p1     1             21
12  HMRCJXU            p1     1             21
13  CURHJXM     false  p1     2              7
14  UXJMRCH            p1     2              7
15  UXJMRCH            p1     2              7
16  XMRCUJH            p1     2              7
17  XMRCUJH            p1     2              7
18  XMRCUJH            p1     2              7
19  FBZOIKS     false  p2     1             21
20  FBZOIKS            p2     1             21
21  FBZOIKS            p2     1             21
22  FBZOIKS            p2     1             21
23  VNTYALQ            p2     1             21
24  VNTYALQ            p2     1             21
25  VNTYALQ            p2     1             21
26  VNTYALQ            p2     1             21
27  HMRCJXU            p2     1             21
28  HMRCJXU            p2     1             21
29  HMRCJXU            p2     1             21
30  HMRCJXU            p2     1             21
31  CURHJXM     false  p2     2              7
32  CURHJXM            p2     2              7
33  UXJMRCH            p2     2              7
34  UXJMRCH            p2     2              7
35  XMRCUJH            p2     2              7
36  XMRCUJH            p2     2              7
37  XMRCUJH            p2     2              7
© www.soinside.com 2019 - 2024. All rights reserved.