我想根据年份栏创建一个人居住的第一栏。我有以下格式的数据
year<- c(2008, 2009, 2010, 2009, 2010, 2011)
person<- c('John', 'John', 'John', 'Brian', 'Brian','Vickey')
location<- c('London','Paris', 'Newyork','Paris','Paris','Miami')
df<- data.frame(year, person, location)
我想创建一个名为first place的列,其值为0,1。如果它是第一个城市,则为1,否则为0。
有什么建议?
与dplyr
的解决方案:
df %>%
group_by(person) %>%
mutate(FirstPlace = +(location[which.min(year)] == location))
# A tibble: 6 x 4
# Groups: person [3]
# year person location FirstPlace
# <dbl> <fctr> <fctr> <int>
#1 2008 John London 1
#2 2009 John Paris 0
#3 2010 John Newyork 0
#4 2009 Brian Paris 1
#5 2010 Brian Paris 1
#6 2011 Vickey Miami 1
location[which.min(year)]
找出第一个位置,然后将第一个位置与位置列进行比较,并将布尔结果转换为整数。如果只看第一年:
df %>%
group_by(person) %>%
mutate(FirstPlace = +(min(year) == year))
# A tibble: 6 x 4
# Groups: person [3]
# year person location FirstPlace
# <dbl> <fctr> <fctr> <int>
#1 2008 John London 1
#2 2009 John Paris 0
#3 2010 John Newyork 0
#4 2009 Brian Paris 1
#5 2010 Brian Paris 0
#6 2011 Vickey Miami 1
与data.table
you可以做:
library("data.table")
year<- c(2008, 2009, 2010, 2009, 2010, 2011)
person<- c('John', 'John', 'John', 'Brian', 'Brian','Vickey')
location<- c('London','Paris', 'Newyork','Paris','Paris','Miami')
df<- data.frame(year, person, location)
setDT(df)[, firstPlace:=as.integer(min(year)==year), person]
# > setDT(df)[, firstPlace:=as.integer(min(year)==year), person]
# > df
# year person location firstPlace
# 1: 2008 John London 1
# 2: 2009 John Paris 0
# 3: 2010 John Newyork 0
# 4: 2009 Brian Paris 1
# 5: 2010 Brian Paris 0
# 6: 2011 Vickey Miami 1
或者(如@Frank提到的)您的数据是按人和年分类的
setDT(df)[, firstPlace:=+!duplicated(person)]
或(此变体)
setDT(df)[, firstPlace:=+(rowidv(person)==1)]
first_city<-df%>%
group_by(person)%>%
arrange(year)%>%
slice(1)
library(dplyr)
first_city <- df %>%
group_by(person) %>%
top_n(1, year)
或者作为数据中的额外列:
df %>%
group_by(person) %>%
arrange(year) %>%
mutate(first_city = head(location, 1))
指示第一个城市有1
和0
otherwuse(仅限第一年)
df %>%
group_by(person) %>%
arrange(year) %>%
mutate(first_city = as.integer(head(location, 1) == location & year == min(year)))
# A tibble: 6 x 4
# Groups: person [3]
# year person location first_city
# <dbl> <fct> <fct> <int>
# 1 2008 John London 1
# 2 2009 John Paris 0
# 3 2009 Brian Paris 1
# 4 2010 John Newyork 0
# 5 2010 Brian Paris 0
# 6 2011 Vickey Miami 1