How to split a vector which has both characters and numbers?

Dear All,

I am having some issues in my code, bassically I want to separate both characters and numbers from lines which are separated by ( "=").

When I read in data files using "readLines" and use "strsplit" it splits vectors which are having delimiter ("=").

But I don't know how to use just the numerical value which is beside these character vectors.

my example of data I read using readLines-
myData

iVersion=2
startTime=1534343434655
isampleCnt=31457
fFmax=500
dF=0.08978
bOrderData=1
fRPMMean=18.856
sSystemId= GDG-N-76767

[specdata1]

These lines could be in any lines of the read data, I want to find those particular lines and use formula to calculate something.

example :-
formula = (fRPMMean * 4)/ dF

mychar <- c("bOrderData=1","fRPMMean=18.856")
library(tidyverse)

mychar_split <- str_split(mychar,"=")

library(purrr)

walk(mychar_split,
     ~assign(x = .[1],
             value = parse_number(.[2]),
             envir = globalenv()))

fRPMMean
class(fRPMMean)

bOrderData
1 Like

This particular list of data also has lines with just numbers like -

Version=2
startTime=1534343434655
isampleCnt=31457
fFmax=500
dF=0.08978
bOrderData=1
fRPMMean=18.856
sSystemId= GDG-N-76767

[specdata1]
0.00453
8.474745
0.0009387
9.78789
#finish

perhaps there are rows without equal signs. in which case variable names cant be made out of them.
Do you really want approx 50,000 variables assigned ?
how would you know which of the 50,000 variables you want to use in your program ?
maybe you should prepare a list of the variables you know you want to read, and that can just get the numbers for those....

anyway, here is example of junkydata that would interfere, being 'dealt with' by ignoring it.

mychar <- c("bOrderData=1","","fRPMMean=18.856","junk=junk")
library(tidyverse)

mychar_split <- str_split(mychar,"=")

library(purrr)

walk(mychar_split,
     ~assign(x = .[1],
             value = parse_number(.[2]),
             envir = globalenv()))

fRPMMean

walk(mychar_split,
     ~tryCatch(assign(x = .[1],
             value = parse_number(.[2]),
             envir = globalenv()),
             warning=function(w) cat(.[1],"problem ",w$message[[1]]),
             error=function(e) cat(.[1],"problem ",e$message[[1]]),
             finally = cat(.[1],"\n")))

fRPMMean
1 Like
fil <- tempfile(fileext = ".data")
cat("fine=123", "2 3 5 7", "", "also=999",
    file = fil,
    sep = "\n")

(readin <- readLines(fil, n = -1))

(filtered <- readin[grepl("\\=+",readin)])

I'm not sure if I understood complete your issue. But I will suggest try readr::parse_number() function. Someting like this:

library(tidyverse)

lines <- c('Version=2', 'startTime=1534343434655', 'isampleCnt=31457', 'fFmax=500', 'dF=0.08978', 'bOrderData=1', 'fRPMMean=18.856', 'sSystemId= GDG-N-76767', '', '[specdata1]', '0.00453', '8.474745', '0.0009387', '9.78789', '#finish')

set_names(parse_number(lines), str_replace(lines, "=[0-9]+", ""))
#> Warning: 2 parsing failures.
#> row col expected  actual
#>   8  -- a number -      
#>  15  -- a number #finish
#>                Version              startTime             isampleCnt 
#>           2.000000e+00           1.534343e+12           3.145700e+04 
#>                  fFmax               dF.08978             bOrderData 
#>           5.000000e+02           8.978000e-02           1.000000e+00 
#>           fRPMMean.856 sSystemId= GDG-N-76767                        
#>           1.885600e+01                     NA                     NA 
#>            [specdata1]                0.00453               8.474745 
#>           1.000000e+00           4.530000e-03           8.474745e+00 
#>              0.0009387                9.78789                #finish 
#>           9.387000e-04           9.787890e+00                     NA 
#> attr(,"problems")
#> # A tibble: 2 x 4
#>     row   col expected actual 
#>   <int> <int> <chr>    <chr>  
#> 1     8    NA a number -      
#> 2    15    NA a number #finish

From this point you could filter all non-NA values
Best,

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

When I extrapolate this code for my real data, I am getting following error.

Error in assign(x = ., value = parse_number(.[2]), envir = globalenv()) :
attempt to use zero-length variable name

I have 50000 lines in that text file.