Best way to extract distribution and parameters from character variable?

ignacio · July 20, 2019, 11:26am

Suppose I have a character variable normal(0,1). I want to create 3 variables out of it:

distribution <- "normal"
mu <- 0
sd <- 1

I can do this with stringr but i'm wondering if there is a better way of writing this code

library(stringr)

distribution_chr <- "normal(0,1)"

distribution <- str_split(string = distribution_chr, pattern = "\\(", simplify = TRUE)[1,1]

parameters <- distribution_chr %>% 
  str_extract_all("(?<=\\().+?(?=\\))") %>% 
  str_split(pattern = ",", simplify = TRUE)

mu <- as.numeric(parameters[1,1])
sd <- as.numeric(parameters[1,2])

^{Created on 2019-07-20 by the reprex package (v0.2.1)}

Thanks!

pieterjanvc · July 20, 2019, 1:11pm

Hi,

I think that is looking rather good. I found one other way of doing it, but still using stringr, just a different function and extracting groups of interest:

library("stringr")

myString = "normal(0,1)"
parameters = str_match(myString, "(\\w+)\\((\\d*\\.?\\d*),\\s?(\\d*\\.?\\d*)\\)")

distribution = parameters[2]
mu = as.numeric(parameters[3])
md = as.numeric(parameters[4])

This regex should work for both integers and decimal numbers and ignore any white space after the comma if it would be inserted.

Grtz,
PJ

andresrcs · July 20, 2019, 1:46pm

This is not necessarily better but I think that str_extract() feels more natural and human readable in this situation.

library(stringr)

myString = "normal(0,1)"

distribution = str_extract(myString, "(\\w+)(?=\\()")
mu = str_extract(myString, "(?<=\\()\\d+(?=,)")
md = str_extract(myString, "(?<=,\\s?)\\d+(?=\\))")

distribution
#> [1] "normal"
mu
#> [1] "0"
md
#> [1] "1"

^{Created on 2019-07-20 by the reprex package (v0.3.0.9000)}

pieterjanvc · July 20, 2019, 3:34pm

I agree, I did not say my method was better, just different.

PJ

system · July 27, 2019, 3:34pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.