I have a column that is most often a number but sometimes might contain some text wrapped around the numbers.
I managed to remove the text but I am not sure if other text entities might occur, so I thought it's easier to just extract the numbers.
Strangely this works as long as I don't have multiple digits after the decimal point. Maybe I miss something?
Thanks for your help!
library(stringr)
Area = c("saturated( 790887469.345 )",
"saturated( 790887469.3 )",
"saturated( 790887469 )",
"790887469.345",
"790887469.3",
"790887469")
str_extract(Area, "\\d*")
#[1] "" "" "" "790887469" "790887469" "790887469"
# misses the results with the additional text and brackets
# misses the digits after the . (as expected)
str_extract(Area, "\\d*\\.*\\d")
#[1] "790887469.3" "790887469.3" "790887469" "790887469.3" "790887469.3" "790887469"
# correctly extracts all to the first digit after the "."
#okay so far so good, just allow more digits!
str_extract(Area, "\\d*\\.*\\d*")
# [1] "" "" "" "790887469.345" "790887469.3" "790887469"
# What?
# correctly extracts all digits but misses the results in brackets.
# with grouping?
str_extract(Area, "\\d*(\\.\\d*)*")
# [1] "" "" "" "790887469.345" "790887469.3" "790887469"
# nope!