传播不同的列表列

问题描述 投票:1回答:1

我有这样的数据:

library(tidyverse)
yelp_tbl %>%
select(business_id, categories) 

A tibble: 11 x 2
   business_id            categories
   <chr>                  <list>    
 1 5-1qDFGHvYjBjBYe0B5oiQ <chr [3]> 
 2 isl95tLwXQHlkm_vR0PTqw <chr [6]> 
 3 lNwReGEso2mMhzCr0TM-mw <chr [3]> 
 4 XOvQUSHUjE0KkUuwDUR5OA <chr [1]> 
 5 8Y5p2IQMLX6QjGPzxanexg <chr [4]> 
 6 jozuj1ySOk7DPs7OJloj3A <NULL>    
 7 _TGcRp4wyVbvvDsEHXf0Zw <chr [2]> 
 8 3Mwko7AsZaydBm6d4tWMhg <chr [3]> 
 9 uhdbvZ-yCIl_Yj_sU1OhRg <chr [4]> 
10 ht9AOnxm0IfSoUDJTatS1g <chr [3]> 
11 5P7zzVhWvO8nXGPdy7xqhw <chr [5]>

每个企业都可以属于不同的类别。因此,每个企业可能有关于变量值和变量数量的不同值。

我想使用spread从变量值中创建列,并使用fill“True”或“False”来表示categories的值。

到目前为止我创建的代码如下所示:

yelp_tbl %>%
  select(business_id, categories) %>%
  mutate(dummy = "True") %>%
  map(unlist) %>%
  as.data.frame() %>%
  mutate_if(is.factor, as.character) %>%
  spread(categories, dummy, fill = "False")

但我得到这个错误:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 5, 26

我明白这是意思,但在这种情况下我不知道如何解决它。

structure(list(business_id = c("5-1qDFGHvYjBjBYe0B5oiQ", "isl95tLwXQHlkm_vR0PTqw", 
"lNwReGEso2mMhzCr0TM-mw", "XOvQUSHUjE0KkUuwDUR5OA", "8Y5p2IQMLX6QjGPzxanexg", 
"jozuj1ySOk7DPs7OJloj3A", "_TGcRp4wyVbvvDsEHXf0Zw", "3Mwko7AsZaydBm6d4tWMhg", 
"uhdbvZ-yCIl_Yj_sU1OhRg", "ht9AOnxm0IfSoUDJTatS1g", "5P7zzVhWvO8nXGPdy7xqhw"
), categories = list(c("Dry Cleaning & Laundry", "Local Services", 
"Sewing & Alterations"), c("Beauty & Spas", "Skin Care", "Medical Spas", 
"Hair Removal", "Health & Medical", "Laser Hair Removal"), c("Food", 
"Grocery", "Specialty Food"), "Restaurants", c("Japanese", "Restaurants", 
"Korean", "Sushi Bars"), NULL, c("Financial Services", "Banks & Credit Unions"
), c("Nightlife", "Dance Clubs", "Bars"), c("Gyms", "Active Life", 
"Trainers", "Fitness & Instruction"), c("Event Planning & Services", 
"Hotels", "Hotels & Travel"), c("Donuts", "Breakfast & Brunch", 
"Restaurants", "Food", "Coffee & Tea"))), row.names = c(NA, -11L
), class = c("tbl_df", "tbl", "data.frame"))
r dplyr tidyverse tidyr purrr
1个回答
2
投票

我们可以用:

要取消嵌套:

编辑:首先将NULL更改为False

  df$categories[sapply(df$categories,is.null)]<-"False"
df %>% 
  select(business_id, categories) %>%
  head(5) %>% 
  tidyr::unnest(categories) %>% 
  mutate(dummy = "True") %>% 
  mutate_if(is.factor, as.character) %>%
  tidyr::spread(categories, dummy, fill = "False")

除此以外,

library(dplyr)
    df %>% 
     select(business_id, categories) %>%
      head(5) %>%
      mutate(dummy = "True",New=purrr::map(categories,unlist)) %>% 
      as.data.frame() %>%
      mutate_if(is.factor, as.character) %>%
      tidyr::spread(categories, dummy, fill = "False")
© www.soinside.com 2019 - 2024. All rights reserved.