Messy data. All vectors in one column separated by text.

Hello,

My data look like this:

Vector 1
Description 1
Description 2
6570 0.23
6900 0.54
.
.
nnnn 0.53
Vector 2
Description 1
Description 2
6570 1.23
6900 1.67
.
.
nnnn 1.65
etc

Length of the description and the number of variables is the same for each vector.

I want to group it into a table like:
Len V1 V2 ...
6570 0.23 1.23 ...
6900 0.54 1.67 ...
.
.
nnnn 0.53 1.65 ....

I did some research but didn't find how to do it. Could you please help me?

Can you share sample data in a copy/paste friendly format? You can do it by using the dput() function.

dput(object_name_goes_here)

Thank you for your reply
Data sample: https://drive.google.com/file/d/1k84xAe-tG78wkluld6VwhB2hjcmN0YAz/view?usp=drivesdk

does this imply you didnt get as far as making any R object that represents your data ?

The path of least resistance.

  1. edit sample.dat with a plain text editor to remove
1a
PTV
1.00	mm resolution
631	bins
Thu May 20 17:53:57 2021
Min. Bin Dose (cGy),  Bin Volume (cc)

and replace last line with

dose,vol

for more convenient variable names. They can be replaced with more descriptive names for presentation purposes.

  1. In R
readr::read_csv("sample.dat")

── Column specification ────────────────────────────────────────────────────────
cols(
  dose = col_character(),
  vol = col_character()
)

Warning: 30 parsing failures.
row col  expected    actual         file
632  -- 2 columns 1 columns 'sample.dat'
633  -- 2 columns 1 columns 'sample.dat'
634  -- 2 columns 1 columns 'sample.dat'
635  -- 2 columns 1 columns 'sample.dat'
636  -- 2 columns 1 columns 'sample.dat'
... ... ......... ......... ............
See problems(...) for more details.

# A tibble: 4,453 x 2
   dose  vol   
   <chr> <chr> 
 1 0     0.0000
 2 10    0.0000
 3 20    0.0000
 4 30    0.0000
 5 40    0.0000
 6 50    0.0000
 7 60    0.0000
 8 70    0.0000
 9 80    0.0000
10 90    0.0000
# … with 4,443 more rows

Note that 30 rows have problems due to extraneous header rows repeated periodically. Assign the imported object to dat and then remove these problem rows with

dat[!is.na(dat[,2]),]

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.