获得一个人居住的第一个位置

问题描述 投票:0回答:4

我想根据年份栏创建一个人居住的第一栏。我有以下格式的数据

year<- c(2008, 2009, 2010, 2009, 2010, 2011)
person<- c('John', 'John', 'John', 'Brian', 'Brian','Vickey')
location<- c('London','Paris', 'Newyork','Paris','Paris','Miami')
df<- data.frame(year, person, location)

我想创建一个名为first place的列,其值为0,1。如果它是第一个城市,则为1,否则为0。

有什么建议?

r dplyr data.table tidyr
4个回答
0
投票

dplyr的解决方案:

df %>% 
    group_by(person) %>% 
    mutate(FirstPlace = +(location[which.min(year)] == location))

# A tibble: 6 x 4
# Groups:   person [3]
#   year person location FirstPlace
#  <dbl> <fctr>   <fctr>      <int>
#1  2008   John   London          1
#2  2009   John    Paris          0
#3  2010   John  Newyork          0
#4  2009  Brian    Paris          1
#5  2010  Brian    Paris          1
#6  2011 Vickey    Miami          1
  • location[which.min(year)]找出第一个位置,然后将第一个位置与位置列进行比较,并将布尔结果转换为整数。

如果只看第一年:

df %>% 
    group_by(person) %>% 
    mutate(FirstPlace = +(min(year) == year))

# A tibble: 6 x 4
# Groups:   person [3]
#   year person location FirstPlace
#  <dbl> <fctr>   <fctr>      <int>
#1  2008   John   London          1
#2  2009   John    Paris          0
#3  2010   John  Newyork          0
#4  2009  Brian    Paris          1
#5  2010  Brian    Paris          0
#6  2011 Vickey    Miami          1

3
投票

data.tableyou可以做:

library("data.table")
year<- c(2008, 2009, 2010, 2009, 2010, 2011)
person<- c('John', 'John', 'John', 'Brian', 'Brian','Vickey')
location<- c('London','Paris', 'Newyork','Paris','Paris','Miami')
df<- data.frame(year, person, location)
setDT(df)[, firstPlace:=as.integer(min(year)==year), person]
# > setDT(df)[, firstPlace:=as.integer(min(year)==year), person]
# > df
#    year person location firstPlace
# 1: 2008   John   London          1
# 2: 2009   John    Paris          0
# 3: 2010   John  Newyork          0
# 4: 2009  Brian    Paris          1
# 5: 2010  Brian    Paris          0
# 6: 2011 Vickey    Miami          1

或者(如@Frank提到的)您的数据是按人和年分类的

setDT(df)[, firstPlace:=+!duplicated(person)]

或(此变体)

setDT(df)[, firstPlace:=+(rowidv(person)==1)]

2
投票
first_city<-df%>%
group_by(person)%>%
  arrange(year)%>%
  slice(1)

0
投票
library(dplyr)
first_city <- df %>%
  group_by(person) %>%
  top_n(1, year)

或者作为数据中的额外列:

df %>% 
  group_by(person) %>% 
  arrange(year) %>% 
  mutate(first_city = head(location, 1))

指示第一个城市有10 otherwuse(仅限第一年)

df %>% 
  group_by(person) %>% 
  arrange(year) %>% 
  mutate(first_city = as.integer(head(location, 1) == location & year == min(year))) 

# A tibble: 6 x 4
# Groups:   person [3]
#    year person location first_city
#   <dbl> <fct>  <fct>         <int>
# 1  2008 John   London            1
# 2  2009 John   Paris             0
# 3  2009 Brian  Paris             1
# 4  2010 John   Newyork           0
# 5  2010 Brian  Paris             0
# 6  2011 Vickey Miami             1
© www.soinside.com 2019 - 2024. All rights reserved.