添加基于在列B中的日期，在列A中的元素指定的非匹配值在R中C列

Question

我运行一个程序狩猎，并与列的数据帧：表示在该日期特定狩猎区域收获物种的数目日期，种类类型，努力和若干列。然而，“种类型”列打散男性，女性，并为同一品种少年。我要崩溃了同一品种的收获号为每个区域，同时保留所有其他公共信息。这里是我的DF的例子：

Date        Species       Area.1.Harvest  Area.2.Harvest   Effort
2016-04-02  Wild Sheep-M        1              NA            30
2016-04-02  Wild Sheep-F        4              NA            30
2016-04-17  Feral Goat-M        NA             5             50
2016-04-17  Feral Goat-F        NA             3             50
2016-09-18  Wild Sheep-M        NA             6             60
2016-09-18  Wild Sheep-F        NA             1             60
2016-09-18  Wild Sheep-J        NA             1             60

这是我要寻找的结果：

Date        Species       Area.1.Harvest  Area.2.Harvest   Effort
2016-04-02  Wild Sheep          5              NA            30
2016-04-17  Feral Goat          NA             8             50
2016-09-18  Wild Sheep          NA             8             60

我有6个不同的区域，为做到这一点，3年价值收割数据。

Answer 1

你可以做以下的只使用dplyr：

library(dplyr)

df %>%
  group_by(Species = gsub("-.*", "", Species), Date) %>%
  mutate_at(vars(contains("Area")), function(x) sum(x, na.rm = any(!is.na(x))))  %>%
  mutate_at(vars(contains("Effort")), function(x) mean(x, na.rm = any(!is.na(x)))) %>%
  distinct()

不管你有Area或Effort变量的数目的这样的工作（既然你提到你有几个和你的例子只是部分表示）。

输出：

# A tibble: 3 x 5
# Groups:   Species, Date [3]
  Date       Species   Area.1.Harvest Area.2.Harvest Effort
  <chr>      <chr>              <int>          <int>  <dbl>
1 2016-04-02 WildSheep              5             NA     30
2 2016-04-17 FeralGoat             NA              8     50
3 2016-09-18 WildSheep             NA              8     60

自定义功能用于mean和sum，作为通常的例如如您所期望的输出指定mean(x, na.rm = T)将返回0，而不是NA的。

Answer 2

你也可以做到这一点很容易使用data.table库

library(data.table)
df <- data.table(Date = as.Date(c(rep('2016-04-02',2), rep('2016-04-17',2), rep('2016-09-18',3))), Species = c('Wild Sheep-M', 'Wild Sheep-F', 'Feral Goat-M', 'Feral Goat-F', 'Wild Sheep-M', 'Wild Sheep-F','Wild Sheep-J'), Area.1.Harvest = c(1,4,NA,NA,NA,NA,NA), Area.2.Harvest = c(NA,NA,5,3,6,1,1), Effort = c(30, 30, 50, 50, 60, 60, 60))


df[,Species := substr(Species,1,nchar(Species)-2)][,.(Area.1.Harvest = sum(Area.1.Harvest, na.rm=TRUE), 
                                                        Area.2.Harvest = sum(Area.2.Harvest, na.rm=TRUE),
                                                        Effort = mean(Effort, na.rm=TRUE)), by=list(Date, Species)]

#         Date    Species Area.1.Harvest Area.2.Harvest Effort
#1: 2016-04-02 Wild Sheep              5              0     30
#2: 2016-04-17 Feral Goat              0              8     50
#3: 2016-09-18 Wild Sheep              0              8     60

Answer 3

看看图书馆dplyr，其中功能group_by()和summarise()是你正在寻找的那种聚集非常有帮助。

看看图书馆stringr，在那里同样str_sub()功能，帮助您管理和变革字符串（在这种情况下，列品种应的字符，而不是因子）。

library(dplyr)
library(stringr)

df %>% 
 mutate(
    Species = str_sub(Species, 1, nchar(Species) - 2)
  ) %>% 
  group_by(Date, Species) %>% 
  summarise(
    Area.1.Harvest = sum(Area.1.Harvest, na.rm = T),
    Area.2.Harvest = sum(Area.2.Harvest, na.rm = T),
    Effort         = mean(Effort, na.rm = T)
  )

添加基于在列B中的日期，在列A中的元素指定的非匹配值在R中C列

问题描述投票：0回答：3

3个回答

最新问题

添加基于在列B中的日期，在列A中的元素指定的非匹配值在R中C列

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3