我有一个数据框,总结如下:
CEMETERY SEX CONTEXT RaHD.L RaHD.R
1 Medieval-St. Mary Graces FEMALE 7172 21.2 21.6
2 Medieval-St. Mary Graces MALE 6225 23.9 25.2
3 Medieval-St. Mary Graces MALE 9987 23.9 23.5
4 Medieval-St. Mary Graces MALE 11475 22.4 22.3
5 Medieval-St. Mary Graces MALE 12356 25.8 25.4
6 Medieval-St. Mary Graces MALE 12525 22.4 22.3
7 Medieval-St. Mary Graces MALE 12785 22.9 22.6
8 Medieval-St. Mary Graces MALE 13840 22.5 22.9
9 Medieval-Spital Square FEMALE 383 21.5 22.0
10 Medieval-Spital Square MALE 31 23.3 22.0
17 Post-Medieval-Chelsea Old Church FEMALE 19 20.0 20.6
18 Post-Medieval-Chelsea Old Church FEMALE 31 19.5 20.0
19 Post-Medieval-Chelsea Old Church FEMALE 39 19.6 19.2
41 Post-Medieval-St. Thomas Hospital FEMALE 60 21.8 22.6
43 Post-Medieval-St. Thomas Hospital MALE 83 22.4 23.0
我想将CEMETERY列中的向量更改为“中世纪”和“后中世纪”,而不是拥有整个墓地名称,或者创建一个新列,标明“中世纪”或“后中世纪”。
我们可以使用sub
捕获子串到“Medieval”,然后在替换中使用反向引用(\\1
)来捕获子串
df1$CEMETERY <- sub("(.*(M|m)edieval).*", "\\1", df1$CEMETERY)
df1$CEMETERY
#[1] "Medieval" "Medieval" "Medieval" "Medieval"
#[5] "Medieval" "Medieval" "Medieval" "Medieval"
#[9] "Medieval" "Medieval" "Post-Medieval" "Post-Medieval"
#[13] "Post-Medieval" "Post-Medieval" "Post-Medieval"
如果应该保留关于位置的信息,有一种替代方法将CEMETERY
列拆分为“中世纪”之后的第一个连字符(包括在“后中世纪”之后拆分)并将这两个部分分配给两列PERIOD
和CEMETERY
:
library(data.table)
setDT(DF)[, c("PERIOD", "CEMETERY") := tstrsplit(CEMETERY, "(?<=Medieval)-", perl = TRUE)][]
CEMETERY SEX CONTEXT RaHD.L RaHD.R PERIOD 1: St. Mary Graces FEMALE 7172 21.2 21.6 Medieval 2: St. Mary Graces MALE 6225 23.9 25.2 Medieval 3: St. Mary Graces MALE 9987 23.9 23.5 Medieval 4: St. Mary Graces MALE 11475 22.4 22.3 Medieval 5: St. Mary Graces MALE 12356 25.8 25.4 Medieval 6: St. Mary Graces MALE 12525 22.4 22.3 Medieval 7: St. Mary Graces MALE 12785 22.9 22.6 Medieval 8: St. Mary Graces MALE 13840 22.5 22.9 Medieval 9: Spital Square FEMALE 383 21.5 22.0 Medieval 10: Spital Square MALE 31 23.3 22.0 Medieval 11: Chelsea Old Church FEMALE 19 20.0 20.6 Post-Medieval 12: Chelsea Old Church FEMALE 31 19.5 20.0 Post-Medieval 13: Chelsea Old Church FEMALE 39 19.6 19.2 Post-Medieval 14: St. Thomas Hospital FEMALE 60 21.8 22.6 Post-Medieval 15: St. Thomas Hospital MALE 83 22.4 23.0 Post-Medieval
正则表达式中用于标识要拆分的正确连字符的功能称为正向后视。
DF <- readr::read_table(
" CEMETERY SEX CONTEXT RaHD.L RaHD.R
1 Medieval-St. Mary Graces FEMALE 7172 21.2 21.6
2 Medieval-St. Mary Graces MALE 6225 23.9 25.2
3 Medieval-St. Mary Graces MALE 9987 23.9 23.5
4 Medieval-St. Mary Graces MALE 11475 22.4 22.3
5 Medieval-St. Mary Graces MALE 12356 25.8 25.4
6 Medieval-St. Mary Graces MALE 12525 22.4 22.3
7 Medieval-St. Mary Graces MALE 12785 22.9 22.6
8 Medieval-St. Mary Graces MALE 13840 22.5 22.9
9 Medieval-Spital Square FEMALE 383 21.5 22.0
10 Medieval-Spital Square MALE 31 23.3 22.0
17 Post-Medieval-Chelsea Old Church FEMALE 19 20.0 20.6
18 Post-Medieval-Chelsea Old Church FEMALE 31 19.5 20.0
19 Post-Medieval-Chelsea Old Church FEMALE 39 19.6 19.2
41 Post-Medieval-St. Thomas Hospital FEMALE 60 21.8 22.6
43 Post-Medieval-St. Thomas Hospital MALE 83 22.4 23.0"
)[, -1]