I have a variable formatted like "V0497-V0508" and I would like to separate it to get a column which is 497 and one which is 508. I tried using separate to no end. Can anyone help with regular expressions? I'm able to do this using str_sub because it is fixed width but would like to learn more about regular expressions and how to account for consecutive delimiters which I think is the problem here, maybe?
library(tidyverse)
#> -- Attaching packages ------------------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 3.0.0 v purrr 0.2.5
#> v tibble 1.4.2 v dplyr 0.7.5
#> v tidyr 0.8.1 v stringr 1.3.1
#> v readr 1.1.1 v forcats 0.3.0
#> -- Conflicts ---------------------------------------------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
testdat <- tibble(VrangeOrig=c("V0497-V0508", "V0868-V0875", "V1010-V1024"))
testdat %>%
separate(VrangeOrig, into=c('Voriglo', 'Vorighi'), sep="[V\\-]+", remove=F, convert=T)
#> Warning: Expected 2 pieces. Additional pieces discarded in 3 rows [1, 2,
#> 3].
#> # A tibble: 3 x 3
#> VrangeOrig Voriglo Vorighi
#> <chr> <lgl> <int>
#> 1 V0497-V0508 NA 497
#> 2 V0868-V0875 NA 868
#> 3 V1010-V1024 NA 1010
testdat %>%
mutate(
Voriglo=str_sub(VrangeOrig, 2, 5) %>% as.numeric,
Vorighi=str_sub(VrangeOrig, 8, 11) %>% as.numeric
)
#> # A tibble: 3 x 3
#> VrangeOrig Voriglo Vorighi
#> <chr> <dbl> <dbl>
#> 1 V0497-V0508 497 508
#> 2 V0868-V0875 868 875
#> 3 V1010-V1024 1010 1024