Best way to extract distribution and parameters from character variable?

Suppose I have a character variable normal(0,1). I want to create 3 variables out of it:

  • distribution <- "normal"
  • mu <- 0
  • sd <- 1

I can do this with stringr but i'm wondering if there is a better way of writing this code

library(stringr)

distribution_chr <- "normal(0,1)"

distribution <- str_split(string = distribution_chr, pattern = "\\(", simplify = TRUE)[1,1]

parameters <- distribution_chr %>% 
  str_extract_all("(?<=\\().+?(?=\\))") %>% 
  str_split(pattern = ",", simplify = TRUE)

mu <- as.numeric(parameters[1,1])
sd <- as.numeric(parameters[1,2])

Created on 2019-07-20 by the reprex package (v0.2.1)

Thanks!

Hi,

I think that is looking rather good. I found one other way of doing it, but still using stringr, just a different function and extracting groups of interest:

library("stringr")

myString = "normal(0,1)"
parameters = str_match(myString, "(\\w+)\\((\\d*\\.?\\d*),\\s?(\\d*\\.?\\d*)\\)")

distribution = parameters[2]
mu = as.numeric(parameters[3])
md = as.numeric(parameters[4])

This regex should work for both integers and decimal numbers and ignore any white space after the comma if it would be inserted.

Grtz,
PJ

This is not necessarily better but I think that str_extract() feels more natural and human readable in this situation.

library(stringr)

myString = "normal(0,1)"

distribution = str_extract(myString, "(\\w+)(?=\\()")
mu = str_extract(myString, "(?<=\\()\\d+(?=,)")
md = str_extract(myString, "(?<=,\\s?)\\d+(?=\\))")

distribution
#> [1] "normal"
mu
#> [1] "0"
md
#> [1] "1"

Created on 2019-07-20 by the reprex package (v0.3.0.9000)

1 Like

I agree, I did not say my method was better, just different.

PJ

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.