如何在R中的列表中的第一个数字之前分隔前导词

Question

我的清单是：

   A      B
1 Alex    but            
2 likes   lala 54 hi     
3 a lot   number and 33 hello
4 of      face soap 34 hello  
5 food    35 hello

我想在B列中的第一个数字之前提取字符串，并将其转换为新列C列。我想要的输出是：

   A        B                        C 
1 Alex      but                   
2 likes     lala 54 hi               lala
3 a lot     number and 33 hello      number and
4 of        face soap 34 hello       face soap
5 food      35 hello

Answer 1

使用正向前瞻我们可以查找文本后跟空格和数字，然后使用stringr::str_extract返回此文本

library(stringr)
libary(dplyr)
df %>% mutate(C= str_extract(B,'\\D+(?= \\d+)'))


     A                   B          C
1  Alex                 but       <NA>
2 likes          lala 54 hi       lala
3 a lot number and 33 hello number and
4    of  face soap 34 hello  face soap
5  food            35 hello       <NA>

有关stringr和积极前瞻的更多详细信息，您可以查看here

Answer 2

解决这个问题的最好方法是使用dplyr和stringr函数附带的tidyverse和# install.packages('tidyverse') library(tidyverse) d <- tibble(A = c('Alex', 'likes', 'a lot', 'of', 'food'), B = c('but', 'lala 54 hi', 'number and 33 hello', 'face soap 34 hello', '35 hello')) d %>% mutate(C = str_extract(B, '\\D*(?=\\d)'))函数。以下是解决问题的代码：

dplyr::mutate

以下是您需要了解的工作原理：

stringr::str_extract创建一个新列C.它在此列中放置的数据是通过从列B中提取（使用\\D*(?=\\d)）字符创建的。它提取的数据是使用正则表达式提取的。

这里使用的正则表达式是sapply。这看起来既复杂又粗俗，但它正在做的是“寻找任何长度在数字之前的任何非数字字符。给我那些字符，但不是数字”。

希望有所帮助！

Answer 3

我希望这有帮助。使用gsub，您可以通过B列中的值以矢量化方式应用new_column = sapply(df$B, function(x){gsub("^(.*?)[0-9].*", "\\1", x)})函数，并输出滤波后的矢量。

df$C= new_column

这将为您提供一个包含B列中过滤值的向量。然后，您只需将此新向量添加为数据框中的新列：

df <- data.frame(A=c("Alex", "likes", "a lot", "of", "food"), B=c("but", "lala 54 hi", "number and 33 hello", "face soap 34 hello", "35 hello"))
regmatches(df$B, gregexpr("^\\D*(?=\\d)", df$B, perl=TRUE))
# [[1]]
# character(0)
# [[2]]
# [1] "lala "
# [[3]]
# [1] "number and "
# [[4]]
# [1] "face soap "
# [[5]]
# [1] ""

Answer 4

另一种选择，在基地R.

如果您不熟悉正则表达式：

\\D*：字符串的开头
[^0-9]*零或多个非数字，类似于(?=\\d)
https://www.regular-expressions.info/lookaround.html的意思是“向前看一个数字但不包括在返回的模式中”（一个很好的参考前瞻：perl=TRUE）;这是正则表达式的perl扩展，ergo the replace_len0 <- function(x, replace=NA) `[<-`(x, lengths(x) < 1, replace) unlist(replace_len0(regmatches(df$B, gregexpr("^\\D*(?=\\d)", df$B, perl=TRUE)), "")) # [1] "" "lala " "number and " "face soap " ""

这为第一个产生了0长度的向量。这很容易处理，也许有一个快速帮助函数：

NA

（我将默认替换设置为""，因为在我看来，“第一个数字之前有一个空字符串df$C”和“没有数字”之间存在区别。对你说。）

根据需要，可以很容易地将其分配给extract。

Answer 5

使用tidyr的library(dplyr) library(tidyr) df %>% extract(B, "C", "^([a-z\\s]+)\\d", remove = FALSE) %>% mutate(C = replace(C, is.na(C), ""))的另一种方法：

      A                   B           C
1  Alex                 but            
2 likes          lala 54 hi       lala 
3 a lot number and 33 hello number and 
4    of  face soap 34 hello  face soap 
5  food            35 hello

输出：

qazxswpoi

如何在R中的列表中的第一个数字之前分隔前导词

问题描述投票：1回答：5

5个回答

最新问题

如何在R中的列表中的第一个数字之前分隔前导词

问题描述 投票：1回答：5

5个回答

最新问题

问题描述投票：1回答：5