library(tidyverse)
df <- structure(list(col1 = c("6980265",
"x (6969100)",
"1,234.56",
"euro1,000",
"x. (6969100) ",
"x (6969943)",
"x y. (6977416)",
"x-y (6923012) ")),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -8L))
# Directly using parse_number() on col1
df2 <- df |>
mutate(col2 = parse_number(col1, trim_ws = TRUE))
df2
输出为:
A tibble: 8 × 2
col1 col2
<chr> <dbl>
1 "6980265" 6980265
2 "x (6969100)" 6969100
3 "1,234.56" 1235.
4 "euro1,000" 1000
5 "x. (6969100) " NA
6 "x (6969943)" 6969943
7 "x y. (6977416)" NA
8 "x-y (6923012) " NA
Warning message:
There was 1 warning in `mutate()`.
ℹ In argument: `col2 = parse_number(col1, trim_ws =
TRUE)`.
Caused by warning:
! 3 parsing failures.
row col expected actual
5 -- a number x. (6969100)
7 -- a number x y. (6977416)
8 -- a number x-y (6923012)
删除所有非alpha-numeric后,代码有效:
df |>
mutate(col1 = str_replace_all(col1, "[^[:alnum:] ]", "")) |>
mutate(col2 = parse_number(col1, trim_ws = TRUE))
# A tibble: 8 × 2
col1 col2
<chr> <dbl>
1 "6980265" 6980265
2 "x 6969100" 6969100
3 "123456" 123456
4 "euro1000" 1000
5 "x 6969100 " 6969100
6 "x 6969943" 6969943
7 "x y 6977416" 6977416
8 "xy 6923012 " 6923012
问题:为什么
parse_number()
角色“。”和“ - ”可以是成型数字的一部分,以及10位数字,“+”,“ E”,也许还有其他人。 大概是在寻找任何这些字符,当它找到其中一个字符时,它试图解析它并跟随字符作为数字。 在您的NA情况下,它们不是数字的一部分,他们自己坐着。