read .dat and separate data with header (;)

Hi,
I have a .dat file with a header in a semicolon ;
I am trying to separate the column BUT, not able to separate or generate the data in different columns.

The data looks like

AETHALOMETER
Serial number = AE33-S09-01153
Application version = 1.6.9.0
Number of channels = 7

Date(yyyy/MM/dd); Time(hh:mm:ss); Timebase; RefCh1; Sen1Ch1; Sen2Ch1; RefCh2; Sen1Ch2; Sen2Ch2; RefCh3; Sen1Ch3; Sen2Ch3; RefCh4; Sen1Ch4; Sen2Ch4; RefCh5; Sen1Ch5; Sen2Ch5; RefCh6; Sen1Ch6; Sen2Ch6; RefCh7; Sen1Ch7; Sen2Ch7; Flow1; Flow2; FlowC; Pressure(Pa); Temperature(°C); BB(%); ContTemp; SupplyTemp; Status; ContStatus; DetectStatus; LedStatus; ValveStatus; LedTemp; BC11; BC12; BC1; BC21; BC22; BC2; BC31; BC32; BC3; BC41; BC42; BC4; BC51; BC52; BC5; BC61; BC62; BC6; BC71; BC72; BC7; K1; K2; K3; K4; K5; K6; K7; TapeAdvCount;


2022/12/31 23:59:00 60 811116 577812 823547 850583 641487 829046 798686 596629 761489 829038 716092 863203 810620 731773 869954 603360 692720 793744 679065 719890 821893 3764 1218 4982 101325 25.00 6.6 29 39 0 0 10 10 00000 30 1272 1439 1505 1361 1460 1586 1384 1483 1592 1361 1418 1546 1339 1409 1496 1376 1533 1495 1375 1376 1487 0.003483477 0.004248582 0.004457909 0.004786885 0.004835292 0.004988594 0.005106227 238 5 0 0
2023/01/01 00:00:00 60 810986 576637 822850 850431 640345 828421 798532 595675 760977 828830 715004 862604 810425 730788 869410 603154 691896 793247 678799 719026 821338 3764 1218 4982 101325 25.00 5.5 29 39 0 0 10 10 00000 30 1362 1541 1613 1479 1641 1725 1437 1514 1654 1471 1587 1672 1434 1542 1604 1466 1520 1593 1510 1636 1635 0.003485507 0.004251502 0.004458257 0.004790065 0.004839524 0.004991514 0.005117336 238 5 0 0
2023/01/01 00:01:00 60 811045 575628 822370 850457 639352 827975 798581 594861 760634 828904 714195 862314 810468 730028 869136 603230 691404 793110 678855 718524 821194 3764 1218 4982 101325 25.00 7.9 29 40 0 0 10 10 00000 29 1325 1473 1571 1460 1623 1705 1459 1616 1681 1415 1524 1610 1418 1475 1586 1446 1594 1573 1457 1487 1578 0.003486758 0.004254406 0.004462147 0.004792959 0.004840215 0.005004052 0.005118252 238 5 0 0

I am trying to separate the data from by

read.table("AE33_AE33-S09-01153_20220102.dat", header = T, skip=5, sep = "\t")

Please let me know how to do it.

Thanks

The data you posted seem to be delimited with a space character and the header row has three fewer columns than the data. I was able to read it with

DF <- read.table("~/R/Play/Dummy.csv", header = TRUE, sep = " ", skip = 5)

after I added three new column names after TapeAdvCount;. The semicolons in the header names are replaced with periods.
It is possible that your actual data are delimited with tabs but I don't have access to that.

the data can be found here!

https://www.dropbox.com/t/Eb4r0wcZxUsj8hy8
library(data.table)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:data.table':
#> 
#>     hour, isoweek, mday, minute, month, quarter, second, wday, week,
#>     yday, year
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union


# download https://www.dropbox.com/transfer/AAAAAOSEiadRmrG9U5qE05XZbso7ZhLZDqdcHvPHsFtUA6y6dURH4ps and remove first 8 lines and save to a filename for the next line
dat <- fread("~/Desktop/grist.csv")

colnames(dat) <- c("Date",
  "Time",
  "Timebase",
  "RefCh1",
  "Sen1Ch1",
  "Sen2Ch1",
  "RefCh2",
  "Sen1Ch2",
  "Sen2Ch2",
  "RefCh3",
  "Sen1Ch3",
  "Sen2Ch3",
  "RefCh4",
  "Sen1Ch4",
  "Sen2Ch4",
  "RefCh5",
  "Sen1Ch5",
  "Sen2Ch5",
  "RefCh6",
  "Sen1Ch6",
  "Sen2Ch6",
  "RefCh7",
  "Sen1Ch7",
  "Sen2Ch7",
  "Flow1",
  "Flow2",
  "FlowC",
  "Pressure_PA",
  "Temperature_C",
  "BB_pct",
  "ContTemp",
  "SupplyTemp",
  "Status",
  "ContStatus",
  "DetectStatus",
  "LedStatus",
  "ValveStatus",
  "LedTemp",
  "BC11",
  "BC12",
  "BC1",
  "BC21",
  "BC22",
  "BC2",
  "BC31",
  "BC32",
  "BC3",
  "BC41",
  "BC42",
  "BC4",
  "BC51",
  "BC52",
  "BC5",
  "BC61",
  "BC62",
  "BC6",
  "BC71",
  "BC72",
  "BC7",
  "K1",
  "K2",
  "K3",
  "K4",
  "K5",
  "K6",
  "K7",
  "TapeAdvCount",
  "blank1",
  "blank2",
  "blank3")

head(dat)
#>          Date     Time Timebase RefCh1 Sen1Ch1 Sen2Ch1 RefCh2 Sen1Ch2 Sen2Ch2
#> 1: 2022/12/31 23:59:00       60 811116  577812  823547 850583  641487  829046
#> 2: 2023/01/01 00:00:00       60 810986  576637  822850 850431  640345  828421
#> 3: 2023/01/01 00:01:00       60 811045  575628  822370 850457  639352  827975
#> 4: 2023/01/01 00:02:00       60 811045  574539  821794 850470  638301  827491
#> 5: 2023/01/01 00:03:00       60 811091  573062  821051 850500  636860  826851
#> 6: 2023/01/01 00:04:00       60 811103  571059  820006 850542  634941  825969
#>    RefCh3 Sen1Ch3 Sen2Ch3 RefCh4 Sen1Ch4 Sen2Ch4 RefCh5 Sen1Ch5 Sen2Ch5 RefCh6
#> 1: 798686  596629  761489 829038  716092  863203 810620  731773  869954 603360
#> 2: 798532  595675  760977 828830  715004  862604 810425  730788  869410 603154
#> 3: 798581  594861  760634 828904  714195  862314 810468  730028  869136 603230
#> 4: 798573  593973  760225 828912  713290  861941 810479  729186  868794 603243
#> 5: 798649  592781  759721 829039  712089  861505 810573  728079  868405 603335
#> 6: 798643  591137  758974 829086  710387  860812 810579  726512  867742 603354
#>    Sen1Ch6 Sen2Ch6 RefCh7 Sen1Ch7 Sen2Ch7 Flow1 Flow2 FlowC Pressure_PA
#> 1:  692720  793744 679065  719890  821893  3764  1218  4982      101325
#> 2:  691896  793247 678799  719026  821338  3764  1218  4982      101325
#> 3:  691404  793110 678855  718524  821194  3764  1218  4982      101325
#> 4:  690814  792904 678857  717934  820964  3764  1218  4982      101325
#> 5:  690029  792679 678978  717221  820787  3764  1217  4981      101325
#> 6:  688904  792259 678996  716102  820362  3764  1218  4982      101325
#>    Temperature_C BB_pct ContTemp SupplyTemp Status ContStatus DetectStatus
#> 1:            25    6.6       29         39      0          0           10
#> 2:            25    5.5       29         39      0          0           10
#> 3:            25    7.9       29         40      0          0           10
#> 4:            25    7.3       29         40      0          0           10
#> 5:            25    4.4       29         41      0          0           10
#> 6:            25    3.1       29         41      0          0           10
#>    LedStatus ValveStatus LedTemp BC11 BC12  BC1 BC21 BC22  BC2 BC31 BC32  BC3
#> 1:        10           0      30 1272 1439 1505 1361 1460 1586 1384 1483 1592
#> 2:        10           0      30 1362 1541 1613 1479 1641 1725 1437 1514 1654
#> 3:        10           0      29 1325 1473 1571 1460 1623 1705 1459 1616 1681
#> 4:        10           0      30 1376 1573 1632 1532 1711 1790 1515 1666 1747
#> 5:        10           0      30 1911 2160 2270 2118 2309 2478 2148 2395 2480
#> 6:        10           0      30 2554 2892 3039 2830 3184 3317 2828 3081 3270
#>    BC41 BC42  BC4 BC51 BC52  BC5 BC61 BC62  BC6 BC71 BC72  BC7          K1
#> 1: 1361 1418 1546 1339 1409 1496 1376 1533 1495 1375 1376 1487 0.003483477
#> 2: 1471 1587 1672 1434 1542 1604 1466 1520 1593 1510 1636 1635 0.003485507
#> 3: 1415 1524 1610 1418 1475 1586 1446 1594 1573 1457 1487 1578 0.003486758
#> 4: 1480 1584 1685 1514 1631 1695 1511 1501 1645 1539 1632 1667 0.003489242
#> 5: 2130 2362 2427 2120 2261 2376 2227 2330 2425 2187 2273 2372 0.003491971
#> 6: 2838 3085 3240 2803 3090 3146 2872 2996 3131 2963 3140 3217 0.003495757
#>             K2          K3          K4          K5          K6          K7
#> 1: 0.004248582 0.004457909 0.004786885 0.004835292 0.004988594 0.005106227
#> 2: 0.004251502 0.004458257 0.004790065 0.004839524 0.004991514 0.005117336
#> 3: 0.004254406 0.004462147 0.004792959 0.004840215 0.005004052 0.005118252
#> 4: 0.004257730 0.004465526 0.004795263 0.004844725 0.005000356 0.005125851
#> 5: 0.004260046 0.004471658 0.004803153 0.004849235 0.005006588 0.005131291
#> 6: 0.004266948 0.004476431 0.004809991 0.004862204 0.005013719 0.005145287
#>    TapeAdvCount blank1 blank2 blank3
#> 1:          238      5      0      0
#> 2:          238      5      0      0
#> 3:          238      5      0      0
#> 4:          238      5      0      0
#> 5:          238      5      0      0
#> 6:          238      5      0      0

# create a single datetime variable if desired

dat$DateTime <- ymd_hms(paste(dat$Date,dat$Time))
class(dat$DateTime)
#> [1] "POSIXct" "POSIXt"

# remove data table class (leaving data frame) if desired
dat <- as.data.frame(dat)
class(dat)
#> [1] "data.frame"

Thanks for your time. I tried to follow your code but I am not getting the same result as you got!

As my file is in .dat format, so I used,

dat <- fread("AE33_AE33-S09-01153_20230101.dat")

for that, I am facing some warnings as:

Warning messages:
1: In fread("AE33_AE33-S09-01153_20230101.dat") :
  Detected 1 column names but the data has 4 columns (i.e. invalid file). Added 3 extra default column names at the end.
2: In fread("AE33_AE33-S09-01153_20230101.dat") :
  Stopped early on line 4. Expected 4 fields but found 5. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<Number of channels = 7>>

then, I used

dat <- read.delim("AE33_AE33-S09-01153_20230101.dat")

not getting a warning this time, so further I used

colnames(dat) <- c("Date",
                   "Time",
                   "Timebase",
                   "RefCh1",
                   "Sen1Ch1",
                   "Sen2Ch1",
                   "RefCh2",
                   "Sen1Ch2",
                   "Sen2Ch2",
                   "RefCh3",
                   "Sen1Ch3",
                   "Sen2Ch3",
                   "RefCh4",
                   "Sen1Ch4",
                   "Sen2Ch4",
                   "RefCh5",
                   "Sen1Ch5",
                   "Sen2Ch5",
                   "RefCh6",
                   "Sen1Ch6",
                   "Sen2Ch6",
                   "RefCh7",
                   "Sen1Ch7",
                   "Sen2Ch7",
                   "Flow1",
                   "Flow2",
                   "FlowC",
                   "Pressure_PA",
                   "Temperature_C",
                   "BB_pct",
                   "ContTemp",
                   "SupplyTemp",
                   "Status",
                   "ContStatus",
                   "DetectStatus",
                   "LedStatus",
                   "ValveStatus",
                   "LedTemp",
                   "BC11",
                   "BC12",
                   "BC1",
                   "BC21",
                   "BC22",
                   "BC2",
                   "BC31",
                   "BC32",
                   "BC3",
                   "BC41",
                   "BC42",
                   "BC4",
                   "BC51",
                   "BC52",
                   "BC5",
                   "BC61",
                   "BC62",
                   "BC6",
                   "BC71",
                   "BC72",
                   "BC7",
                   "K1",
                   "K2",
                   "K3",
                   "K4",
                   "K5",
                   "K6",
                   "K7",
                   "TapeAdvCount",
                   "blank1",
                   "blank2",
                   "blank3")

But again got the error as

Error in names(x) <- value : 
  'names' attribute [70] must be the same length as the vector [1]

Did you edit the file to remove the first 8 lines?

Got it!
Thanks for the suggestion and your time.

If I want to extract the particular columns from one file, then I can use

dat %>% 
  select(Date,Time,BC1,BC6) -> dat1

and if I want to write a new file then

write.table(dat1, file="~/Desktop/dat.txt", sep ="\t", row.names = TRUE, col.names = NA)

So the desired data would be like,

||Date|Time|BC1|BC6|
|---|---|---|---|---|
|1|2022/12/31|23:59:00|1505|1495|
|2|2023/01/01|00:00:00|1613|1593|
|3|2023/01/01|00:01:00|1571|1573|
|4|2023/01/01|00:02:00|1632|1645|
|5|2023/01/01|00:03:00|2270|2425|

I can do it easily for one file.

But, I have multiple files

AE33_AE33-S09-01153_20230101.dat
to
AE33_AE33-S09-01153_20230121.dat

within a folder, so how to write multiple files from

dat1 to dat21

any idea?
Thanks

It would be along these lines

inputs <- dir()
outputs <- vector()
outputs <- unlist(for(i in inputs) outputs[i] = strsplit(i,".pdf"))

for(i in inputs) {
  for(k in outputs) {
    your_function(i) # read and process source file
    your_function(k) # write to output
  }
}

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.