我有两个数据帧:
df1:我的主数据集,带有地址列
df2:包含纬度和经度加上和地址列的数据库
我想将两列从df2合并到我的df1。
DF1:
ID VAR1 VAR2 VARX Address
1 7 2 x Road 1, 1234 City
2 8 0 y Road 4, 1234 City
3 6 2 x Road 5, 1234 City
4 7 2 x Road 6, 1234 City
5 4 1 y Road 10, 1234 City
6 1 2 x Road 11, 1234 City
DF2:
Address Latitude Longitude
Road 1, 1234 City 12,67 56,78
Road 2, 1234 City 12,66 55,67
Road 3, 1234 City 12,45 55,10
Road 4, 1234 City 12,10 55,20
Road 5, 1234 City 11,50 55,30
Road 6, 1234 City 12,34 55,32
Road 7, 1234 City 12,89 55,40
Road 8, 1234 City 12,77 55,45
Road 9, 1234 City 11,67 55,67
Road 10, 1234 City 11,90 55,78
Road 11, 1234 City 11,12 56,59
所以我的新数据框看起来像这样:
新的数据帧,df3:
ID VAR1 VAR2 VARX Address Latitude Longitude
1 7 2 x Road 1, 1234 City 12,67 56,78
2 8 0 y Road 4, 1234 City 12,10 55,20
3 6 2 x Road 5, 1234 City 11,50 55,30
4 7 2 x Road 6, 1234 City 12,34 55,32
5 4 1 y Road 10, 1234 City 11,90 55,78
6 1 2 x Road 11, 1234 City 11,12 56,59
我尝试过left_join,但它只返回NA。
df3 <- left_join(df1, df2, by = c("Address"))
编辑:已解决显然我的一个地址列中有一些错误的空格。上面的代码确实有效。
left_join
应该工作正常。看看这个并检查您的数据结构。
df3 <- dplyr::left_join(df1, df2, by = "Address")
产量
ID VAR1 VAR2 VARX Address Latitude Longitude
1 1 7 2 x Road 1, 1234 City 12,67 56,78
2 2 8 0 y Road 4, 1234 City 12,10 55,20
3 3 6 2 x Road 5, 1234 City 11,50 55,30
4 4 7 2 x Road 6, 1234 City 12,34 55,32
5 5 4 1 y Road 10, 1234 City 11,90 55,78
6 6 1 2 x Road 11, 1234 City 11,12 56,59
数据
DF1
structure(list(ID = 1:6, VAR1 = c(7L, 8L, 6L, 7L, 4L, 1L), VAR2 = c(2L,
0L, 2L, 2L, 1L, 2L), VARX = structure(c(1L, 2L, 1L, 1L, 2L, 1L
), .Label = c("x", "y"), class = "factor"), Address = structure(c(1L,
4L, 5L, 6L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City",
"Road 11, 1234 City", "Road 4, 1234 City", "Road 5, 1234 City",
"Road 6, 1234 City"), class = "factor")), .Names = c("ID", "VAR1",
"VAR2", "VARX", "Address"), class = "data.frame", row.names = c(NA,
-6L))
DF2
structure(list(Address = structure(c(1L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City",
"Road 11, 1234 City", "Road 2, 1234 City", "Road 3, 1234 City",
"Road 4, 1234 City", "Road 5, 1234 City", "Road 6, 1234 City",
"Road 7, 1234 City", "Road 8, 1234 City", "Road 9, 1234 City"
), class = "factor"), Latitude = structure(c(9L, 8L, 7L, 5L,
2L, 6L, 11L, 10L, 3L, 4L, 1L), .Label = c("11,12", "11,50", "11,67",
"11,90", "12,10", "12,34", "12,45", "12,66", "12,67", "12,77",
"12,89"), class = "factor"), Longitude = structure(c(10L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("55,10", "55,20",
"55,30", "55,32", "55,40", "55,45", "55,67", "55,78", "56,59",
"56,78"), class = "factor")), .Names = c("Address", "Latitude",
"Longitude"), class = "data.frame", row.names = c(NA, -11L))
基本R功能
merge(df1,df2,by = "Address")
产量
Address ID VAR1 VAR2 VARX Latitude Longitude
1 Road 1, 1234 City 1 7 2 x 12,67 56,78
2 Road 10, 1234 City 5 4 1 y 11,90 55,78
3 Road 11, 1234 City 6 1 2 x 11,12 56,59
4 Road 4, 1234 City 2 8 0 y 12,10 55,20
5 Road 5, 1234 City 3 6 2 x 11,50 55,30
6 Road 6, 1234 City 4 7 2 x 12,34 55,32