给出 ?merge 手册页中的示例数据:
authors <- data.frame(
## I(*) : use character columns of names to get sensible sort order
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
authorN <- within(authors, { name <- surname; rm(surname) })
books <- data.frame(
name = I(c("Tukey", "Venables", "Tierney",
"Ripley", "Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
"Venables & Smith"))
我愿意:
m <- merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)
并得到:
> head(authors,3)
surname nationality deceased
1 Tukey US yes
2 Venables Australia no
3 Tierney US no
> head(books,3)
name title other.author
1 Tukey Exploratory Data Analysis <NA>
2 Venables Modern Applied Statistics ... Ripley
3 Tierney LISP-STAT <NA>
>
> head(m,3)
surname nationality deceased title other.author
1 McNeil Australia no Interactive Data Analysis <NA>
2 R Core <NA> <NA> An Introduction to R Venables & Smith
3 Ripley UK no Spatial Statistics <NA>
输出 m 中缺少书籍中的“名称”列。我有什么办法可以保留它吗?除此之外,我需要注意数据集中“by”变量之间的无意差异(例如拼写错误)。
在合并的数据帧 m 中,不会保留作者数据帧中的列姓氏,因为您正在按此列进行合并,并且它会被书籍数据帧中的列名称替换。
但是您可以向 m 数据框添加名称列并分配一个值,就像我在下面的代码中所做的那样
m$name <- NA # Create a new column 'name' and initialize it with NA values
m$name[!is.na(m$surname)] <- m$surname[!is.na(m$surname)] # Copy values from 'surname' to 'name' where 'surname' is not NA
m