这个问题在这里已有答案:
我正在处理如下的数据帧。我已经尽力在SO中格式化它。重要的是person
,personparty
和sponsordate
中有相同数量的逗号分隔条目(我截断了单元格,所以它们在这个例子中可能不一样,但它们在数据集中是相同的)。
bill status person personparty sponsordate
A bill to amend chapter 44 of title 18, .... 2ND Sen. David Vitter [R-LA] Republican 12/05/2015
A bill to authorize the appropriation of funds.... RESTRICT Sen. Ed Markey [D-MA], Sen. Ed Markey [D-MA], Sen. Ed Markey [D-MA], Sen. Barbara Boxer [D-CA] Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, 21/05/2014, 02/06/2015, 05/04/2017, 22/05/2014, 21/07/2014, 09/06/2014, 02/06/2014, 12/06/2014, 21/05/2014, 02/06/2014, 21/05/2014
我想创建一个包含五列的新数据框。我基本上想要将这些(非列表)值取消列入一个更大的数据帧。
最终的数据框应该有一行用于第Σ逗号分隔的条目,保持bill
和status
的相同列值。
例如,从我的示例数据集的第二行开始,会有一行带有账单名称(一个授权拨款的账单......),状态(RESTRICT),Ed Markey,民主党人,21/05 / 2014年下一行将是逗号分隔值的第二个条目(相同的账单名称,相同的状态,参议员Ed Markey [D-MA],民主党人,2015年6月2日)等。
对于在最后三列中只有一个值的行,它们保持不变。
我如何基本上取消这些类似列表的值?
你似乎在寻找separate_rows
。
假设:这三列中的逗号分隔值具有相同的数字。它基于你的帖子的摘录 - “重要的是,有相同数量的逗号分隔条目亲自,personparty和赞助”
library(dplyr)
library(tidyr)
df %>%
separate_rows(person, personparty, sponsordate, sep=",")
输出是:
bill status person personparty
1 A bill to amend chapter 44 of title 18, .... 2ND Sen. David Vitter [R-LA] Republican
2 A bill to authorize the appropriation of funds.... RESTRICT Sen. Ed Markey [D-MA] Democrat
3 A bill to authorize the appropriation of funds.... RESTRICT Sen. Ed Markey [D-MA] Democrat
4 A bill to authorize the appropriation of funds.... RESTRICT Sen. Ed Markey [D-MA] Democrat
5 A bill to authorize the appropriation of funds.... RESTRICT Sen. Barbara Boxer [D-CA] Democrat
sponsordate
1 12/05/2015
2 21/05/2014
3 02/06/2015
4 05/04/2017
5 22/05/2014
样本数据:
df <- structure(list(bill = structure(1:2, .Label = c("A bill to amend chapter 44 of title 18, ....",
"A bill to authorize the appropriation of funds...."), class = "factor"),
status = structure(1:2, .Label = c("2ND Sen.", "RESTRICT"
), class = "factor"), person = structure(1:2, .Label = c("David Vitter [R-LA]",
"Sen. Ed Markey [D-MA], Sen. Ed Markey [D-MA], Sen. Ed Markey [D-MA], Sen. Barbara Boxer [D-CA]"
), class = "factor"), personparty = structure(c(2L, 1L), .Label = c("Democrat, Democrat, Democrat, Democrat",
"Republican"), class = "factor"), sponsordate = structure(1:2, .Label = c("12/05/2015",
"21/05/2014, 02/06/2015, 05/04/2017, 22/05/2014"), class = "factor")), .Names = c("bill",
"status", "person", "personparty", "sponsordate"), class = "data.frame", row.names = c(NA,
-2L))
不确定我已经理解了你想要的东西,所以我从数据框开始,我认为你有:
df=structure(list(bill = c("A bill to amend chapter 44 of title 18, .<U+0085>",
"A bill to authorize the appropriation of funds...."), status = c("2ND Sen.",
"RESTRICT"), person = c("David Vitter [R-LA]", "Sen. Ed Markey [D-MA], Sen. Ed Markey [D-MA], Sen. Ed Markey [D-MA], Sen. Barbara Boxer [D-CA]"
), personparty = c("Republican", "Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat, Democrat,"
), sponsordate = c("12/05/15", "21/05/2014, 02/06/2015, 05/04/2017, 22/05/2014, 21/07/2014, 09/06/2014, 02/06/2014, 12/06/2014, 21/05/2014, 02/06/2014, 21/05/2014"
)), .Names = c("bill", "status", "person", "personparty", "sponsordate"
), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
), spec = structure(list(cols = structure(list(bill = structure(list(), class = c("collector_character",
"collector")), status = structure(list(), class = c("collector_character",
"collector")), person = structure(list(), class = c("collector_character",
"collector")), personparty = structure(list(), class = c("collector_character",
"collector")), sponsordate = structure(list(), class = c("collector_character",
"collector"))), .Names = c("bill", "status", "person", "personparty",
"sponsordate")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
现在我明白你想把第二行扩展到很多行。如果'many'表示第2行cols 3,4,5的向量元素的所有组合并将其附加到数据框(重叠第2行),则可以按如下方式执行:
librart(stringr)
x01=str_split(df$person[2],",")[[1]]
x02=str_split(df$personparty[2],",")[[1]]
x03=str_split(df$sponsordate[2],",")[[1]]
x04=expand.grid(x01,x02,x03)
df0=do.call("rbind", replicate(nrow(x04), df[2,], simplify = FALSE))
df0[2:(nrow(x04)+1),3:5]=as.matrix(x04)
希望这可以帮助