How to extract numerical data from a file and create a dataframe in R

Below is a sample template of my file where I would like to extract the numerical data and create a dataframe,

Contents of the file

*****************************************************************
 ******  option summary
 ******
 runlvl  = 3         bypass  = 2         
  Opening plot unit= 15
 file=new_run.pa0

 ******  
 

  ********  dc transfer curves tnom=  25.000 temp=  25.000 *****
x
        
    volt      current    
                    v0     
 -100.00000m      406.5220f  
 -200.00000m      806.6048f  
 -300.00000m      1.2066p  
 -400.00000m      1.6067p  
 -500.00000m      2.0067p  
 -600.00000m      2.4066p  
 -700.00000m      2.8066p  
 -800.00000m      3.2067p  
 -900.00000m      3.6067p  
   -1.00000       4.0067p  
   -1.10000       4.4067p  
   -1.20000       4.8068p  
   -1.30000       5.2069p  
   -1.40000       5.6068p  
   -1.50000       6.0070p  
   -1.60000       6.4069p  
   -1.70000       6.8070p  
   -1.80000       7.2069p  
   -1.90000       7.6070p  
   -2.00000       8.0069p  
   -2.10000       8.4071p  
   -2.20000       8.8070p  
   -2.30000       9.2071p  
   -2.40000       9.6070p**  


          ***** job concluded
 ****** HSPICE -- H-2013.03-SP2 32-BIT (Aug 26 2013) RHEL32 ******              
 ******  
 

  ******  job statistics summary tnom=  25.000 temp=  25.000 *****
  
  
 ******  Machine Information  ******

I would like to extract the values that are the below the line v0 and create a two column dataframe.

Volt             Current
-100.00000m      406.5220f
"........................"

for all the rows till the numerical data is present. The fact of the matter is, the number of lines before the line volt and current is dynamic and the number of lines after the numerical data gets over is also dynamic. The numerical data rows are also dynamic. One thing that can be considered is numerical data will always start after the below two lines of the file,

volt      current    
                        v0 

Below is my code which I have tried by specifying line numbers of the data,

DATA <- readLines(myfile)
  
  DataStartPos <- 314
  DataEndPos <- 1062
  
  
  #Seperate numeric data and META data and bind to data frame
  
    tmp <- as.data.frame(DATA[DataStartPos:DataEndPos])
    tmp <- separate(tmp, col = 1, c("S.No", "Volts", "Amps"), sep = "\\s+")

I'm now able to create a dataframe but only by specifying line numbers of the file statically. Is there any way to identify the numerical data with the explained case.

My suggestion would be

  1. use readr::read_lines or similar to read in each line as a record in a dataset.
  2. filter out the lines you don't want - blank ones, etc.
  3. filter out lines that don't have a "-" column in position 1, or whatever position that is.
    3b) Use separate, or str_sub to isolate the values you want.
  4. that should do it; write_* (fst, csv, etc) the file(s) and on you go
  5. If you need to recover meta-data, and I bet you do, build out additional variables/data frames/tibbles
    using this basic idea with separate df for each "concept" or record type.
  6. Merge the bit and pieces together with bind_columns as needed.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.