我有一个如下所示的数据集:
dt <- structure(list(servicerequestid = c("254475", "255470", "249438",
"249398", "249399"), createdate = structure(c(1471592400, 1471874280,
1470037140, 1470028740, 1470031020), tzone = "UTC", class = c("POSIXct",
"POSIXt")), closedate = structure(c(1473661860, 1472457480, 1470641700,
1491918180, 1470293940), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L), .Names = c("servicerequestid", "createdate", "closedate"
))
# A tibble: 5 x 3
servicerequestid createdate closedate
<chr> <dttm> <dttm>
1 254475 2016-08-19 07:40:00 2016-09-12 06:31:00
2 255470 2016-08-22 13:58:00 2016-08-29 07:58:00
3 249438 2016-08-01 07:39:00 2016-08-08 07:35:00
4 249398 2016-08-01 05:19:00 2017-04-11 13:43:00
5 249399 2016-08-01 05:57:00 2016-08-04 06:59:00
每个servicerequestid
都是从createdate
到closedate
开放的服务请求的id。我想转换这个数据集,使得每个servicerequestid
将拥有与票证开放日期一样多的观察值及其各自的日期。
例如,对于servicerequestid== 255470
,数据集看起来像:
# A tibble: 8 x 2
servicerequestid date
<dbl> <date>
1 255470 2016-08-22
2 255470 2016-08-23
3 255470 2016-08-24
4 255470 2016-08-25
5 255470 2016-08-26
6 255470 2016-08-27
7 255470 2016-08-28
8 255470 2016-08-29
我正在尝试下面的代码,但它不起作用:
dt %>%
mutate(seq.Date(as.Date(createdate), as.Date(closedate), by="days"))
一些背景:我试图在ggplot中创建一个动画密度图,我认为一种可能的方法是创建每日观察。这样,每天我都应该可以看到打开的门票数量。
这是一种方法:
library(tidyverse)
dt %>%
mutate_if(~inherits(.x, "POSIXct"), as.Date) %>% # convert posix cols to date
gather(var, date, -1) %>% # wide to long format
select(-var) %>% # we don't need this
group_by(servicerequestid) %>% # for every id...
expand(date = full_seq(date, 1)) %>% # create the date range
filter(servicerequestid == 255470) # Then grab the example one
# # A tibble: 8 x 2
# # Groups: servicerequestid [1]
# servicerequestid date
# <chr> <date>
# 1 255470 2016-08-22
# 2 255470 2016-08-23
# 3 255470 2016-08-24
# 4 255470 2016-08-25
# 5 255470 2016-08-26
# 6 255470 2016-08-27
# 7 255470 2016-08-28
# 8 255470 2016-08-29
另一个tidyverse
解决方案。
library(tidyverse)
dt2 <- dt %>%
mutate_at(vars(ends_with("date")), funs(as.Date)) %>% # Convert date time class to date class
mutate(date = map2(createdate, closedate, seq.Date, by = 1)) %>% # Create a list column with dates
unnest() %>% # Expand based on the list column
select(servicerequestid, date) %>% # Select the desired columns
filter(servicerequestid == 255470) # Filter for servicerequestid 255470
dt2
# # A tibble: 8 x 2
# servicerequestid date
# <chr> <date>
# 1 255470 2016-08-22
# 2 255470 2016-08-23
# 3 255470 2016-08-24
# 4 255470 2016-08-25
# 5 255470 2016-08-26
# 6 255470 2016-08-27
# 7 255470 2016-08-28
# 8 255470 2016-08-29