具有多个条件的子集Data.Frame

问题描述 投票:0回答:1

最终目标:

在第1行:167行的日期范围内为StressCumulative,BaseCumulative,StressQoQ和BaseQoQ的每个区域创建一个图。

问题:

我很难对我的data.frame进行子集化。我的问题是我的子集化条件是合乎逻辑的,因此只返回条件之后的第一个元素。

subset_region_1 <- subset.data.frame(HPF, HPF$region == 1, select = BaseCumulative, HPF$StressCumulative, StressQoQ, BaseQoQ)

Warning messages:
1: In if (drop) warningc("drop ignored") :
  the condition has length > 1 and only the first element will be used
2: drop ignored 

这只返回第一列BaseCumulative。

数据:

在这里你可以看到我正在使用的东西。这是我要寻找的子集。我的data.frame是一个高大的格式Region 1 Region 2

我想创建一个子集,以便在第1行:167行的日期范围内绘制BaseCumulative,StressCumulative,BaseQoQ和StressQoQ变量。日期列对所有100个区域使用相同的日期。我的问题是,当我去绘制ggplot时,我得到一个错误,我的aes映射的大小不同。完整表的日期长度为18370行,但值每167行重复一次(对于每个唯一区域)。此外,BaseCumulative变量也是18370行长,但对于所有区域是唯一的,即每167行。我想知道如何按区域进行子集化,同时为我感兴趣的变量获取正确的行大小。

数据点:

#Rows 1-3 (Region 1 Sample): 
dput(head(HPF[1:3, ]))
    structure(list(region = c(1, 1, 1), path = c(1, 1, 1), date = c(20140215, 
    20140515, 20140815), index_value = c(1, 1.033852765, 1.041697122
    ), index = 0:2, counter = 1:3, BaseQoQ = c(NA, 0.033852765, 0.00758749917354029
    ), BaseCumulative = c(100, 103.3852765, 104.1697122), StressCumulative = c(110, 
    113.3852765, 114.1697122), StressQoQ = c(NA, 0.0307752409090909, 
    0.00691832065162346)), .Names = c("region", "path", "date", "index_value", 
    "index", "counter", "BaseQoQ", "BaseCumulative", "StressCumulative", 
    "StressQoQ"), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
    "data.frame"))

#Rows 168:200 (Region 2 Sample):
dput(head(HPF[168:200, ]))
    structure(list(region = c(2, 2, 2, 2, 2, 2), path = c(1, 1, 1, 
    1, 1, 1), date = c(20140215, 20140515, 20140815, 20141115, 20150215, 
    20150515), index_value = c(1, 1.014162265, 1.01964828, 1.009372314, 
    1.007210703, 1.018695493), index = 0:5, counter = 1:6, BaseQoQ = c(NA, 
    0.014162265, 0.00540940556489744, -0.0100779515854232, -0.0021415398163972, 
    0.0114025694582001), BaseCumulative = c(100, 101.4162265, 101.964828, 
    100.9372314, 100.7210703, 101.8695493), StressCumulative = c(110, 
    111.4162265, 111.964828, 110.9372314, 110.7210703, 101.8695493
    ), StressQoQ = c(NA, 0.0128747863636363, 0.00492389230216839, 
    -0.00917785181610786, -0.00194849914020834, -0.0799443229370588
    )), .Names = c("region", "path", "date", "index_value", "index", 
    "counter", "BaseQoQ", "BaseCumulative", "StressCumulative", "StressQoQ"
    ), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
    ))

题:

除了指定region ==#?之外,我如何对其他列进行子集化?我尝试过以下但是问题是日期的值回收并且我的图表不正确:

ggplot(HPF, aes(x = date, y= BaseCumulative, linetype = factor(region == 1))) + 
  geom_line() +
  theme_light()

此外,如果我尝试在ggplot中进行子集,我也是不成功的,例如:

ggplot(HPF[HPF$region == 1, ], aes(x = HPF$date[1:167, ], y= HPF$BaseCumulative[1:167, ], linetype = factor(region == 1))) + 
      geom_line() +
      theme_light()

任何帮助表示赞赏。

r ggplot2 subset visualization
1个回答
2
投票

我不完全确定你想要在你的情节中展示什么;这就是你要追求的吗?

library(tidyverse);
df %>%
    gather(what, value, 7:10) %>%
    ggplot(aes(date, value, colour = what)) + geom_line() + theme_light()

enter image description here

说明:将数据从宽格式转换为长格式,然后将what作为colour(或linetype)美学传递,以在一个图中为列7, 8, 9, 10获取不同的线图。


如果你想要region的单独图,你可以添加+ facet_wrap(~ as.factor(region)),例如

df %>%
    gather(what, value, 7:10) %>%
    ggplot(aes(date, value, colour = what)) + geom_line() + theme_light() + facet_wrap(~ as.factor(region))

enter image description here


样本数据

df1 <- structure(list(region = c(1, 1, 1), path = c(1, 1, 1), date = c(20140215,
    20140515, 20140815), index_value = c(1, 1.033852765, 1.041697122
    ), index = 0:2, counter = 1:3, BaseQoQ = c(NA, 0.033852765, 0.00758749917354029
    ), BaseCumulative = c(100, 103.3852765, 104.1697122), StressCumulative = c(110,
    113.3852765, 114.1697122), StressQoQ = c(NA, 0.0307752409090909,
    0.00691832065162346)), .Names = c("region", "path", "date", "index_value",
    "index", "counter", "BaseQoQ", "BaseCumulative", "StressCumulative",
    "StressQoQ"), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
    "data.frame"));

df2 <- structure(list(region = c(2, 2, 2, 2, 2, 2), path = c(1, 1, 1,
    1, 1, 1), date = c(20140215, 20140515, 20140815, 20141115, 20150215,
    20150515), index_value = c(1, 1.014162265, 1.01964828, 1.009372314,
    1.007210703, 1.018695493), index = 0:5, counter = 1:6, BaseQoQ = c(NA,
    0.014162265, 0.00540940556489744, -0.0100779515854232, -0.0021415398163972,
    0.0114025694582001), BaseCumulative = c(100, 101.4162265, 101.964828,
    100.9372314, 100.7210703, 101.8695493), StressCumulative = c(110,
    111.4162265, 111.964828, 110.9372314, 110.7210703, 101.8695493
    ), StressQoQ = c(NA, 0.0128747863636363, 0.00492389230216839,
    -0.00917785181610786, -0.00194849914020834, -0.0799443229370588
    )), .Names = c("region", "path", "date", "index_value", "index",
    "counter", "BaseQoQ", "BaseCumulative", "StressCumulative", "StressQoQ"
    ), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
    ))

df <- rbind.data.frame(df1, df2);
© www.soinside.com 2019 - 2024. All rights reserved.