我在df中有一个功能,其中包含一些缺失值,显示为“”。
unique(page_my_df$Type)
[1] "list" "narrative" "how to" "news feature"
[5] "diary" "" "interview"
我想用“未知”替换所有“”的实例。
page_my_df <- page_my_df %>%
mutate(Type = str_replace(.$Type, "", "unknown"),
Voice = str_replace(.$Voice, "", "unknown"))
mutate_impl(.data,dots)出错:评估错误:未实现。
阅读一些文档here,特别是在模式下:
使用boundary()匹配字符,单词,行和句子边界。空模式“”等同于边界(“字符”)。
所以我尝试过:
page_my_df <- page_my_df %>%
mutate(Type = str_replace(.$Type, boundary(""), "unknown"),
Voice = str_replace(.$Voice, boundary(""), "unknown"))
然后给了:
mutate_impl(.data,dots)出错:评估错误:'arg'应该是“character”,“line_break”,“sentence”,“word”之一。
如何在dplyr :: mutate()中用“unknown”替换空字符串?
这是一种方法:
library(tidyverse)
library(stringr)
z <- c( "list", "narrative", "how to", "news feature",
"diary", "" , "interview" )
data.frame(element = 1:length(z), Type = z) %>%
mutate(Type = str_replace(Type, "^$", "unknown"))
#output
element Type
1 1 list
2 2 narrative
3 3 how to
4 4 news feature
5 5 diary
6 6 unknown
7 7 interview
此外,无需使用.$
引用mutate调用中的数据帧
^和美元符号$是元字符,分别匹配行的开头和结尾的空字符串。
通过检查字符串长度的另一种解决方案:
library(dplyr)
strings <- c("list","narrative","how to","news feature","diary","","interview" )
df <- data.frame(ID = 1:length(strings), strings, stringsAsFactors = FALSE)
> df
ID strings
1 1 list
2 2 narrative
3 3 how to
4 4 news feature
5 5 diary
6 6
7 7 interview
df <- df %>% mutate(strings = if_else(nchar(strings) == 0, "unknown", strings))
> df
ID strings
1 1 list
2 2 narrative
3 3 how to
4 4 news feature
5 5 diary
6 6 unknown
7 7 interview