How to split a vector which has both characters and numbers?

Shri1506 · May 11, 2020, 7:46am

Dear All,

I am having some issues in my code, bassically I want to separate both characters and numbers from lines which are separated by ( "=").

When I read in data files using "readLines" and use "strsplit" it splits vectors which are having delimiter ("=").

But I don't know how to use just the numerical value which is beside these character vectors.

my example of data I read using readLines-
myData

iVersion=2
startTime=1534343434655
isampleCnt=31457
fFmax=500
dF=0.08978
bOrderData=1
fRPMMean=18.856
sSystemId= GDG-N-76767

[specdata1]

These lines could be in any lines of the read data, I want to find those particular lines and use formula to calculate something.

example :-
formula = (fRPMMean * 4)/ dF

nirgrahamuk · May 11, 2020, 9:03am

mychar <- c("bOrderData=1","fRPMMean=18.856")
library(tidyverse)

mychar_split <- str_split(mychar,"=")

library(purrr)

walk(mychar_split,
     ~assign(x = .[1],
             value = parse_number(.[2]),
             envir = globalenv()))

fRPMMean
class(fRPMMean)

bOrderData

Shri1506 · May 11, 2020, 9:45am

When I extrapolate this code for my real data, I am getting following error.

Error in assign(x = ., value = parse_number(.[2]), envir = globalenv()) :
attempt to use zero-length variable name

I have 50000 lines in that text file.

Shri1506 · May 11, 2020, 9:54am

This particular list of data also has lines with just numbers like -

Version=2
startTime=1534343434655
isampleCnt=31457
fFmax=500
dF=0.08978
bOrderData=1
fRPMMean=18.856
sSystemId= GDG-N-76767

[specdata1]
0.00453
8.474745
0.0009387
9.78789
#finish

nirgrahamuk · May 11, 2020, 9:57am

perhaps there are rows without equal signs. in which case variable names cant be made out of them.
Do you really want approx 50,000 variables assigned ?
how would you know which of the 50,000 variables you want to use in your program ?
maybe you should prepare a list of the variables you know you want to read, and that can just get the numbers for those....

anyway, here is example of junkydata that would interfere, being 'dealt with' by ignoring it.

mychar <- c("bOrderData=1","","fRPMMean=18.856","junk=junk")
library(tidyverse)

mychar_split <- str_split(mychar,"=")

library(purrr)

walk(mychar_split,
     ~assign(x = .[1],
             value = parse_number(.[2]),
             envir = globalenv()))

fRPMMean

walk(mychar_split,
     ~tryCatch(assign(x = .[1],
             value = parse_number(.[2]),
             envir = globalenv()),
             warning=function(w) cat(.[1],"problem ",w$message[[1]]),
             error=function(e) cat(.[1],"problem ",e$message[[1]]),
             finally = cat(.[1],"\n")))

fRPMMean

nirgrahamuk · May 11, 2020, 11:12am

fil <- tempfile(fileext = ".data")
cat("fine=123", "2 3 5 7", "", "also=999",
    file = fil,
    sep = "\n")

(readin <- readLines(fil, n = -1))

(filtered <- readin[grepl("\\=+",readin)])

gavg712 · May 12, 2020, 3:39am

I'm not sure if I understood complete your issue. But I will suggest try readr::parse_number() function. Someting like this:

library(tidyverse)

lines <- c('Version=2', 'startTime=1534343434655', 'isampleCnt=31457', 'fFmax=500', 'dF=0.08978', 'bOrderData=1', 'fRPMMean=18.856', 'sSystemId= GDG-N-76767', '', '[specdata1]', '0.00453', '8.474745', '0.0009387', '9.78789', '#finish')

set_names(parse_number(lines), str_replace(lines, "=[0-9]+", ""))
#> Warning: 2 parsing failures.
#> row col expected  actual
#>   8  -- a number -      
#>  15  -- a number #finish
#>                Version              startTime             isampleCnt 
#>           2.000000e+00           1.534343e+12           3.145700e+04 
#>                  fFmax               dF.08978             bOrderData 
#>           5.000000e+02           8.978000e-02           1.000000e+00 
#>           fRPMMean.856 sSystemId= GDG-N-76767                        
#>           1.885600e+01                     NA                     NA 
#>            [specdata1]                0.00453               8.474745 
#>           1.000000e+00           4.530000e-03           8.474745e+00 
#>              0.0009387                9.78789                #finish 
#>           9.387000e-04           9.787890e+00                     NA 
#> attr(,"problems")
#> # A tibble: 2 x 4
#>     row   col expected actual 
#>   <int> <int> <chr>    <chr>  
#> 1     8    NA a number -      
#> 2    15    NA a number #finish

From this point you could filter all non-NA values
Best,

system · June 2, 2020, 3:39am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.