col_datetime() missing argument

Hi there!
I am receiveng the error message (translated from german):

"Error in cols(`Study Date rounded` = col_datetime(format = "%Y-%m-%d %H:%M:%S")
    argument missing without default."

Since the data and a lof of the code I am allowed to work with is not mine, I try to use as few code as possible in the following excerpt, I believe it should be sufficient:

DoseTrack <- read_delim(
  "C:/[Censored]",
  delim = ";",
  escape_double = TRUE,
  col_types = cols(
    `Study Date rounded` = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
    [...]

"Study Date rounded" does exist as a column in my data. Also the declaration of the format seems to be correct, at least the data uses "-" and ":" at the corresponding locations as well. So what is the issue? Are there still key parts of the code missing to solve it?

Thank you in advance. :slight_smile:

I assume Study Date rounded is a column.

Try:

col_datetime(Study Date rounded, format = "%Y-%m-%d %H:%M:%S")

Study Date rounded is a column, yes. The translated result from the above mentioned code is this:

Error in source("[Censored]: unexpected symbol
11: col_types = cols(
12: Study Date rounded = col_datetime(Study Date
^
Indeed, the "rounded" in "Study Date rounded" is cut off in the end, the arrow head faces the D in the second Date of the line above it. Unfortunately, the proposal created a different error. Thank you for the attempt, however. :slight_smile: Putting Study Date rounded in these apostrophies didn't work either, the result is

unused argument (Study Date rounded )

Did you maybe put them into something else?

Maybe you could show a couple of rows of your data.

Next to the above mentioned issue, namely that the data is technically not mine, there is another problem with uploading specifically the data. Its entries are (even though ananymous) data from patients. I, myself, needed a positive ethical vote to even take a look at and work with the ananymous data.

Therefore I will try to just describe it:
The whole data set consists of about 50000 rows and 70 columns. "Study Date rounded" , without the apostrophies, is the 68th one. A fictional entry looks like this:
2015-03-15 14:00:00
All entries in Study Date rounded are rounded to a full hour.

I do find it very unfortunate that I cannot provide the data set, having said that I think that I should have provided all the relevant information. The precise name of the other columns (which are different, there is only one called Study Date rounded) and the exact value of the entries should hopefully not matter.

Just modify it. Change some numbers, names, etc.

Why not omit the explicit column type settings, allow readr to guess the types. Then manipulate the types of your dataframe once its a dataframe...?

Otherwise, I asked this question a while ago. It might apply to what you are doing:

Apologies for not answering for so long, I was very busy with work, Christmas and moving house. Thank
you very much for the responses in the meantime. :slight_smile:

I have prepared 5 rows of random, made up data with the exact same syntax as the original data. I also have erased the entire col_types command and typed it again, interestingly enough, the col_datetime - command now works, even for my original data, even though I am not sure what changed. Which brings me back to the original problem I have had before that error suddenly appeared:

"Something is wrong, all the MRSE metric values are missing."

There are a few warnings complaining about zero variances in the random data (while I have made sure to make the values not identical, I have no idea why that happens ), the rest of the issues are the same as with the original data, however. The main one now is "Something is wrong, all the MRSE metric values are missing."
Typing warnings() with either data set essentially yields the same results, something like:

"model fit failed for Resample01: size= 5, decay=7.113e-02 Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels"

Is there a connection between the warnings and the missing values? What am I supposed to do with the warnings?
The code is this:

DoseTrack <- read_delim(
  "C:[Censored]",
  delim = ";",
  escape_double = TRUE,
  col_types = cols(
   `Study Date rounded` = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
   `Exam Code` = col_character(),
   `SSDE Effective Diameter Source` = col_character(),
   `SSDE Effective Diameter (cm)` = col_double(),
   `SSDE Coefficient` = col_double(),
   `SSDE Max (mGy)` = col_double(),
   `Effective Dose 103 (mSv)` = col_double(),
   Pitch = col_double(),
   `SSDE (mGy)` = col_double()
 ),
  locale = locale(
    decimal_mark = ",",
    grouping_mark = "."
  ),
  trim_ws = TRUE
)
DoseTrack=select(DoseTrack,-"Phantom Code", -"CTDIPhantomTypeCodeValue", -"CTDIPhantomTypeCodeMeaning")
DoseTrack.clean <- DoseTrack 
model_nnet <- train(
  x = as.data.frame(DoseTrack.clean ),
  y = DoseTrack.clean$`CTDIVol (mGy)`,
  method = "nnet",
  preProc = c("center", "scale"),
  trControl = trainControl(
    search = "random",
    allowParallel = TRUE,
    savePredictions = "final"
  ),
  tuneLength = 5,
  maxit = 500,
  MaxNWts = 5000,
  linout = TRUE,
  trace = TRUE
) 

I wanted to upload the random data, however csv-files are not supported. Therefore I just copy the text below. I can open and edit the data set with (I assume any) Editor, so just copying the data into an editor should be enough. Is there a way to actually upload the csv-file next time? I haven't found anything on the Internet. Thank you also in advance for any suggestions concerning the missing MRSE values. :slight_smile:

"Age (Years)";"Sex";"Height (cm)";"Weight (kg)";"Exam Code";"Exam Description";"Location";"Hospital";"Modality Room";"Modality Type";"Equipment Name";"Station Name";"AET";"Dose Alarm";"Investigation Status";"Investigation Comment";"Dose Alert Reason";"Habitus";"Dose Trigger Value";"Dose Trigger Description";"Dose Trigger Type";"Exposure Count";"Protocol Name";"Protocol Code";"Protocol Description";"DLP Total (mGy*cm)";"DLP Max (mGy*cm)";"DLP Spiral Max (mGy*cm)";"CTDIVol Max (mGy)";"CTDIVol Spiral Max (mGy)";"SSDE Effective Diameter Source";"SSDE Effective Diameter (cm)";"SSDE Coefficient";"SSDE Max (mGy)";"Ordinal";"Acquisition Protocol Name";"Acquisition Type";"Exposure Time (ms)";"mAs (mAs)";"Tube Current (uA)";"Tube Voltage Peak (kV)";"Reject Reason Code";"Target Region";"Acquisition Protocol";"CTDIVol (mGy)";"DLP (mGy*cm)";"Effective Dose 103 (mSv)";"Exposure Time Per Rotation (ms)";"Nominal Total Collimation Width (mm)";"Phantom Code";"Phantom Description";"Pitch";"Scanning Length (mm)";"X-Ray Source Identifier";"SSDE (mGy)";"CTDIMean";"Dw_Mean";"SSDE_Dw";"Deff_Mean";"SSDE_Deff";"Dw_MidSliceexp";"SSDE_Dw_MidSlice";"Deff_MidSlice";"SSDE_Deff_MidSlice";"Mittlere Anzahl Pixel Rand pro Bild";"Pixelspacing";"CTDIPhantomTypeCodeValue";"CTDIPhantomTypeCodeMeaning";"Patient ID hashed";"Study Date rounded";"Accession Number hashed"
12;"M";0;33;"1";"CT";"UFK";"UFK";"CT";"CT";"Bang";"CT";"CT";NA;NA;NA;NA;"c";NA;NA;NA;2;"Th";"Th";"Th";138,45;130,57;130,57;4,22;4,22;"Lat";31,7;1,36;3,567;2;"CT";"Spiral Acquisition";3450;4364,78;100000;140;NA;"ABDOMEN";"CT Abdomen und Becken";5,32;456,89;1,63;355;110;1234;"IEC Body Dosimetry Phantom";1,1;234;"A";2,756;6,354567;756,345232;2,878909;243,098765;3,332122;223,97864;2,857908;243,873456;5,523778;556,978534;0,678953;12345;"Shade";"fthu78j6g6fzjim90l7g5de3s45g6h78";2013-11-22 19:00:00;"d8ei9o05lgirkf87940p0ej568rk4u23"
33;"M";159;59;"2";"CT";"UFK";"UFK";"CT";"CT";"Bang";"CT";"CT";NA;NA;NA;NA;"b";NA;NA;NA;5;"Cu";"Cu";"Cu";234,12;200;200;4,33;4,33;"Lat";12,3;1,09;2,345;2;"CT";"Spiral Acquisition";7470;2125,49;90000;130;NA;"ABDOMEN";"CT Abdomen und Becken";1,67;345,67;8,74;360;100;1234;"IEC Body Dosimetry Phantom";1,2;534;"A";6,231;2,967856;612,453456;2,756432;544,356875;2,867213;254,98765;4,965302;139,967843;7,054321;890,867555;0,765987;12345;"Shade";"s3f5ghumo09lkz76g5ftzhunmj456f2s";2011-07-25 13:00:00;"d9ek5itlgio70zphlzm5472js8rk90gt"
34;"F";139;0;"289";"CT";"UFK";"UFK";"CT";"CT";"Bang";"CT";"CT";NA;NA;NA;NA;"a";NA;NA;NA;3;"Ol";"Ol";"Ol";123,45;122,45;122,45;5,23;5,23;"Lat";27,8;0,67;5,678;3;"CT";"Spiral Acquisition";5880;2648,65;102000;110;NA;"ABDOMEN";"CT Abdomen und Becken";7,56;234,56;4,23;350;120;1234;"IEC Body Dosimetry Phantom";1,0;342;"A";2,778;6,756345;365,987654;3,989898;645,978532;7,967543;245,86754;7,064321;323,845312;7,645333;789,087098;0,998589;12345;"Shade";"3f4gzhki90lopö0ß987jznbtrf4e32w";2017-10-10 16:00:00;"d9li83irke459odlri0ofu85kriej734"
29;"M";0;129;"25";"CT";"UFK";"UFK";"CT";"CT";"Bang";"CT";"CT";NA;NA;NA;NA;"c";NA;NA;NA;2;"Pa";"Pa";"Pa";567,89;500,12;500,12;5,67;5,67;"Lat";29,8;1,2;2,346;3;"CT";"Spiral Acquisition";9790;3245,43;135000;100;NA;"ABDOMEN";"CT Abdomen und Becken";4,78;123,56;6,45;350;100;1234;"IEC Body Dosimetry Phantom";1,0;465;"A";7,243;5,756898;326,956432;4,957643;534,986431;3,234567;246,09872;3,078987;254,756890;8,656678;65,908744;0,367543;12345;"Shade";"v6gh78jikuzhgtrf5r4e5gnmklopöl09";2016-10-05 07:00:00;"f9l504oel0fp47eh23s4ujkloi98u456"
68;"F";136;65;"12";"CT";"UFK";"UFK";"CT";"CT";"Bang";"CT";"CT";NA;NA;NA;NA;"c";NA;NA;NA;1;"Ha";"Ha";"Ha";345,56;234,56;234,56;7,34;7,34;"Lat";17,3;1;7,890;3;"CT";"Spiral Acquisition";6440;2312,65;89000;110;NA;"ABDOMEN";"CT Abdomen und Becken";5,67;125,89;3,45;320;100;1234;"IEC Body Dosimetry Phantom";1,0;723;"A";2,354;3,554552;223,876543;3,333433;354,867543;5,867544;765,94754;4,97865;354,234111;8,00908;45,908749;0,985267;12345;"Shade";"b5t6h7j8kloujznbtrf54d35h9lk0l98";2019-05-08 23:00:00;"d0l45umxnc64hdnsbcte39fömruelk23"

nnet's preprocessing routines have the effect of omitting na's
so effectively

DoseTrack.clean <- na.omit(DoseTrack )

this leaves 0 observations. and so no variations.
I recommend spending time looking at where NA's are present in your data, deciding to eliminate any afflicted columns that you think will not be useful anyway, or think of a way to replace NA's with other values...
Are you in actuality fitting many more than 5 observations ? 5 would be extremely small sample to fit such a model....

Thank you for your answer. :slight_smile:

There are a few NA's in there, yes. I assumed that just the columns with NA's are omitted, and not everything. Therefore I do not understand why

DoseTrack.clean <- DoseTrack 

leaves 0 observations and variations. Even the columns which are complained about in the random data for having no variance, namely Ordinal, Pitch and Nominal Total Collimation Width do have differences and are each non-NA.

In actuality I use 50000 data rows :slight_smile: Changing each one by hand would have been too much work, however.

Hi Daniel,
na.omit() , as well as stats::complete.cases which the caret/nnet code you use relies upon under the hood, both evaluate a dataframe of input at the row level, and test if any column is missing as criteria to exclude the entire / row observation. i.e. the na exhibiting columns are poisining your data, and the fact that even the vast majority of your data is not-na is almost irrelevant.

I apologise if you already understood this from my previous comment, and think im perhaps repeating myself, but I thought it best that I be explicit about my message on this.

All the best.

Hi nirgrahamuk,

thank you very much for your elaboration on the mechanics of the nnet function. Since I am very new to R, I in fact did not know that. Now I understand, however. :slight_smile:
I have eliminated every NA-entry in the random data and the old error message disappears. The code is

library(tidyverse)
library(lubridate)
library(readr)
library(caret)

DoseTrack <- read_delim(
  "[Censored]",
  delim = ";",
  escape_double = TRUE,
  col_types = cols(
   `Study Date rounded` = col_datetime(format = "%Y-%m-%d %H:%M:%S"),
   `Exam Code` = col_character(),
   `SSDE Effective Diameter Source` = col_character(),
   `SSDE Effective Diameter (cm)` = col_double(),
   `SSDE Coefficient` = col_double(),
   `SSDE Max (mGy)` = col_double(),
   `Effective Dose 103 (mSv)` = col_double(),
   Pitch = col_double(),
   `SSDE (mGy)` = col_double()
 ),
  locale = locale(
    decimal_mark = ",",
    grouping_mark = "."
  ),
  trim_ws = TRUE
)
DoseTrack=select(DoseTrack,-"Phantom Code", -"CTDIPhantomTypeCodeValue", -"CTDIPhantomTypeCodeMeaning", -"Dose Alarm",
                 -"Investigation Status", -"Investigation Comment", -"Dose Alert Reason", -"Dose Trigger Value",
                 -"Dose Trigger Description", -"Dose Trigger Type", -"Reject Reason Code")
DoseTrack.clean <- DoseTrack #     %>%
  filter(
    # Keine Telemedizinischen Bilder
    ! str_detect(`Station Name`, "TM_") &
    # Keine Localizer
    `Acquisition Type` != "Constant Angle Acquisition" &
    `Acquisition Type` != "Stationary Acquisition" &
    # Keine Interventionen
    ! str_detect(`Exam Description`, "(Drainage|Punktion)") &
    ! str_detect(`Exam Code`, "(Punktion|Intervention)") &
    ! str_detect(`Protocol Name`, "Intervention")
  ) %>%

model_nnet <- train(
  x = as.data.frame(DoseTrack.clean ),
  y = DoseTrack.clean$`CTDIVol (mGy)`,
  method = "nnet",
  preProc = c("center", "scale"),
  trControl = trainControl(
    search = "random",
    allowParallel = TRUE,
    savePredictions = "final"
  ),
  tuneLength = 5,
  maxit = 500,
  MaxNWts = 5000,
  linout = TRUE,
  trace = TRUE
)
warnings()

Now I have the error message:

Error in { : task 1 failed - "Replacement has 1 row, data has 0"

translated from german. I assume this has to do with the functionality of the nnet function as well, even though I don't know what to do with it.
Also, R still complains about no variation in Ordinal, Pitch and Nominal Total Collimation Width (mm). Why does that happen given that the values are different?

I have eliminated the NA-columns from the original data as well, however, it did not work there.
I did not check all the 50000 data rows for potential NA's (some NA's could still exist somewhere, whereas
the random data is completely clean), however, the first rows already have no NA. Therefore, according
to what I understand from your statements, the model should have at least some data to work with, still,
it complains about the missing MRSE values. Do I, therefore, have to get rid of every NA? That could be
the difference between the original and the random data. If so, is there an efficient way to search for them?
50000*60 entries are tough to do by hand.

All the best as well. :slight_smile:

I would use skimr::skim() function to report on my data in a summarised way, which includes reporting incidences of NA values

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.