我在 R 中有一个数据集,如下所示:
data = structure(list(quarter = c("Q1 2005", "Q2 2005", "Q3 2005", "Q4 2005",
"Q1 2006"), value = c(128.76, 178.83, 140.9, 188.3, 194.05)), class = "data.frame", row.names = c(NA,
-5L))
我想扩展此数据集以添加一个名为“月”的额外列(例如一月、二月、三月……)...并且我想将连续季度之间所有月份的差异进行拆分,以便数字仍然相加每季度之间上涨。
我尝试这样做:
library(dplyr)
expand_dataset <- function(data) {
quarter_to_months <- list(
"Q1" = c("Jan", "Feb", "Mar"),
"Q2" = c("Apr", "May", "Jun"),
"Q3" = c("Jul", "Aug", "Sep"),
"Q4" = c("Oct", "Nov", "Dec")
)
expanded_data <- data.frame()
for (i in 1:(nrow(data) - 1)) {
current_value <- data$value[i]
next_value <- data$value[i + 1]
diff <- (next_value - current_value) / 3
quarter <- substr(data$quarter[i], 1, 2)
year <- substr(data$quarter[i], 4, 7)
months <- quarter_to_months[[quarter]]
for (j in 1:3) {
month_value <- current_value + (j - 1) * diff
expanded_data <- rbind(expanded_data, data.frame(
quarter = data$quarter[i],
month = months[j],
year = year,
value = month_value
))
}
}
return(expanded_data)
}
expanded_data <- expand_dataset(data)
print(expanded_data)
这是正确的方法吗?有更简单的方法吗?
如果您的意思是您为月份划分的总值等于您在原始数据集中分配的值,则您的代码不会创建该解决方案。
有很多很多方法可以实现这一目标。
此答案假设您想要一个值,该值总计等于该季度的值。我使用了季度值的 1/3 并将其分配给每个月。
此解决方案使用 dplyr 和 tidyr。
d2 <- data %>% rowwise %>% # use rowwise so that vals only uses 1 row of data
mutate(months = case_when(
str_like(quarter, 'Q1') ~ list(month.abb[1:3]), # find strings like & assoc months
str_like(quarter, 'Q2') ~ list(month.abb[4:6]),
str_like(quarter, 'Q3') ~ list(month.abb[7:9]),
TRUE ~ list(month.abb[10:12])), # if no other conditions met
vals = list(rep(1/3 * value, 3))) %>% # divide quarter vals amongst quarters' months
unnest_longer(c(months, vals)) # get rid of the nested lists
# # A tibble: 15 × 4
# quarter value months vals
# <chr> <dbl> <chr> <dbl>
# 1 Q1 2005 129. Oct 42.9
# 2 Q1 2005 129. Nov 42.9
# 3 Q1 2005 129. Dec 42.9
# 4 Q2 2005 179. Oct 59.6
# 5 Q2 2005 179. Nov 59.6
# 6 Q2 2005 179. Dec 59.6
# 7 Q3 2005 141. Oct 47.0
# 8 Q3 2005 141. Nov 47.0
# 9 Q3 2005 141. Dec 47.0
# 10 Q4 2005 188. Oct 62.8
# 11 Q4 2005 188. Nov 62.8
# 12 Q4 2005 188. Dec 62.8
# 13 Q1 2006 194. Oct 64.7
# 14 Q1 2006 194. Nov 64.7
# 15 Q1 2006 194. Dec 64.7