使用不同的列名称合并两个数据集：left_Join

Question

我正在尝试使用两个单独的列名称合并两个数据集，但是它们共享相同的唯一值。例如，数据集1中的列A == xyzw，而在数据集2中，列的名称为B但值== xyzw。

但是，问题在于，在数据集2中，列的B值== xyzw引用公司名称，并出现几次，具体取决于数据集中该公司中有多少雇员。

本质上，我想创建一个新列，在数据集1中称其为C，告诉我每个公司有多少员工。

我尝试了以下操作：

## Counting how many teachers are in each matched school, using the "Matched" column from matching_file_V4, along with the school_name column from the sample11 dataset:
merged_dataset <- left_join(sample11,matched_datasets,by="school_name")

虽然此代码有效，但实际上并没有为我提供每个公司的雇员人数。

Answer 1

如果您可以提供示例数据和预期的输出，那么其他人可以更轻松地提供帮助。但是尽管如此，我希望这能给您您想要的东西：

假设我们有这两个数据帧：

df_1 <- data.frame(
  A = letters[1:5],
  B = c('empl_1','empl_2','empl_3','empl_4','empl_5')
)

df_2 <- data.frame(
  C = sample(rep(c('empl_1','empl_2','empl_3','empl_4','empl_5'), 15), 50),
  D = sample(letters[1:5], 50, replace=T)
)


# I suggest you find the number of employees for each firm in the second data frame  


df_2%>%group_by(C)%>%
  summarise(
    num_empl = n()
  )%>%  ### Then do the left join
  left_join(
    df_1,., by=c('B' = 'C') ## this is how you can join on two different column names
  )

#  A      B num_empl
# 1 a empl_1        8
# 2 b empl_2       11
# 3 c empl_3       10
# 4 d empl_4       10
# 5 e empl_5       11

使用不同的列名称合并两个数据集：left_Join

问题描述投票：0回答：1

1个回答

最新问题

使用不同的列名称合并两个数据集：left_Join

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1