单元格值细分和长度不均匀的字符串分割

问题描述 投票:0回答:1

我需要细分列中包含的字符串值。有些细胞根本不需要任何分裂。其他人可能需要一个、两个或更多。

我还希望将分割值存储在并发列中。

例如,如果我的初始数据框看起来像这样

df2 <- data.frame(district= 1000:1003,
                  party= c("PartyGreen", "Coalition1(PartyRed-PartyBlue)", "PartyRed", "Coal.(PartyBlue-PartyOrange-VelvetParty)"))

我希望看起来像这样:

df.neat.2 <- data.frame(district= 1000:1003,
                  party= c("PartyGreen", "Coalition1(PartyRed-PartyBlue)", "PartyRed", "Coal.(PartyBlue-PartyOrange-VelvetParty)"),
                  party1= c("PartyGreen", "PartyRed", "PartyRed", "PartyBlue"),
                  party2= c(NA, "PartyBlue", NA, "PartyOrange"),
                  party3= c(NA, NA, NA, "VelvetParty"))

注意某些单元格包含不需要拆分的单个值。另请注意,字符串拆分发生在括号

()
内,并由破折号拆分。

r database string dataframe
1个回答
0
投票
library(tidyr)
library(dplyr)
library(stringr)

df2 |>
  mutate(parties = str_remove_all(party, ".*\\(|\\).*"),
         parties = str_split(parties, fixed("-"))) |>
  unnest_wider(parties, names_sep = "_")
© www.soinside.com 2019 - 2024. All rights reserved.