Persistent "Unknown or uninitialised column" warnings

I've been receiving these warnings off-and-on for about a year. It's the strangest thing - they appear even after executing a command that doesn't even involve the offending variable, and persist across R sessions. The warnings appear randomly, so I haven't been able to reliably reproduce them.

The problem has reached a boil in the past two weeks because I'm automatically generating markdown reports, and many of them get spoiled by the output of these warnings into my chunk outputs.

The warning seems related to initializing a new column in a tibble. I have tried to inialize both with mutate() and with a direct assignment with $, but the warnings eventually start to appear either way.

My problem seems to be the same as the problem documented in this two threads.

Is anyone else plagued by these? Anything I can do to fix it? I am using RStudio Version 1.3.878. Many thanks!

9 Likes

Yes I get these as well. Very annoying.

3 Likes

I get these very annoying warning messages as well.

2 Likes

I read data in with "read_csv" Error occurred and then it disappeared when I saved from excel with UTF-9.csv. rather than .csv.

I get these as well. As you said it's so random that I cannot reliably reproduce them, hence no solution yet. tidyverse devs, please address this.

1 Like

Well, at least I'm glad to learn that I'm not the only one.

I wish I could give a better lead to help the devs track down the issue, like a reproducible example, but like I said, the warnings appear randomly. My only observation is that the warnings always seem associated with a column I initialized myself with a single value, like

DF$Col <- NA

or

DF <- mutate(DF, Col = NA)

Unfortunately, without a reproducible example there is little we can do to help. The only hint I have is that this could be triggered by code that tries to modify values in a column that does not yet exist; e.g.

library(tibble)
tbl <- tibble::tibble(x = 1)
tbl$y[1] <- 2

gives

> library(tibble)
> tbl <- tibble::tibble(x = 1)
> tbl$y[1] <- 2
Warning: Unknown or uninitialised column: `y`.

I had the same error after using pivot_wider

myq <- myq %>% select(CIF,QUESTION_ID,RESPONSE) %>%
pivot_wider(names_from = QUESTION_ID , values_from = RESPONSE)

and warning stay all over any new code that I run: this is the warning

Warning messages:
1: Unknown or uninitialised column: RESPONSE.
2: Unknown or uninitialised column: RESPONSE.
3: Unknown or uninitialised column: RESPONSE.
4: Unknown or uninitialised column: RESPONSE.

I got the previous warning message running a table function

(table(customer_enhance$EMPLOYMENT_STATUS,customer_enhance$ANNUAL_INCOME))

1 Like

I suggest that this is a non-trivial problem.
I observe it too, but it arises seemingly unpredictably.
"Unknown or unitialised column: XXX"
appears after commands that make no reference to a variable with the name/column XXX.
POSSIBLY the warnings were generated earlier, but the notification of them comes after a different REPL command?

The lack replicable examples reflects the seeming randomness of the error message, alas. I worry it has something to do with a memory violation at a low level, because weird thing like this happen in the RStudio REPL:

warnings()

no warnings yet

expect q_i to be just a numeric vector:

dim(q_i)
NULL
warnings() # still no warnings
head(q_i) # as expected
[1] 1 1 1 1 1 1
length(q_i) #spawns warning naming a column 'pooled' that is in a different variable/df
[1] 400
Warning messages:
1: Unknown or uninitialised column: 'pooled'.
2: Unknown or uninitialised column: 'pooled'.
3: Unknown or uninitialised column: 'pooled'.
4: Unknown or uninitialised column: 'pooled'.
5: Unknown or uninitialised column: 'pooled'.

confirm q_i is just a numeric vector:

str(q_i)
num [1:400] 1 1 1 1 1 1 1 1 1 1 ...
warnings() # 5 warnings still there
Warning messages:
1: Unknown or uninitialised column: 'pooled'.
2: Unknown or uninitialised column: 'pooled'.
3: Unknown or uninitialised column: 'pooled'.
4: Unknown or uninitialised column: 'pooled'.
5: Unknown or uninitialised column: 'pooled'.

3 Likes

I wish I could provide a reproducible example, but there are sometimes cases where a bug is real but not reproducible. I realize this makes it difficult to address.

Is there some way I can export a detailed warnings log to help diagnose the issue?

This is an example of what my console looks like when RStudio gets into the mode of throwing the uninitialized columns warnings. It starts throwing the warnings after almost every line of code I execute, even if the code doesn't reference the offending data frame, or any data frame at all - in this code, the Date object is a list, and it doesn't have an element named Sample. Sample is a column for an unrelated data frame in memory.

> FirstTrainVector <- c(year(Date$FirstTrain), month(Date$FirstTrain))
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: Unknown or uninitialised column: `Sample`.
2: Unknown or uninitialised column: `Sample`.
3: Unknown or uninitialised column: `Sample`.
4: Unknown or uninitialised column: `Sample`.
5: Unknown or uninitialised column: `Sample`.
6: Unknown or uninitialised column: `Sample`.
7: Unknown or uninitialised column: `Sample`.
8: Unknown or uninitialised column: `Sample`.
9: Unknown or uninitialised column: `Sample`.
10: Unknown or uninitialised column: `Sample`.
11: Unknown or uninitialised column: `Sample`.
12: Unknown or uninitialised column: `Sample`.
13: Unknown or uninitialised column: `Sample`.
14: Unknown or uninitialised column: `Sample`.
15: Unknown or uninitialised column: `Sample`.
16: Unknown or uninitialised column: `Sample`.
17: Unknown or uninitialised column: `Sample`.
18: Unknown or uninitialised column: `Sample`.
19: Unknown or uninitialised column: `Sample`.
20: Unknown or uninitialised column: `Sample`.
21: Unknown or uninitialised column: `Sample`.
22: Unknown or uninitialised column: `Sample`.
23: Unknown or uninitialised column: `Sample`.
24: Unknown or uninitialised column: `Sample`.
25: Unknown or uninitialised column: `Sample`.
26: Unknown or uninitialised column: `Sample`.
27: Unknown or uninitialised column: `Sample`.
28: Unknown or uninitialised column: `Sample`.
29: Unknown or uninitialised column: `Sample`.
30: Unknown or uninitialised column: `Sample`.
31: Unknown or uninitialised column: `Sample`.
32: Unknown or uninitialised column: `Sample`.
33: Unknown or uninitialised column: `Sample`.
34: Unknown or uninitialised column: `Sample`.
35: Unknown or uninitialised column: `Sample`.
36: Unknown or uninitialised column: `Sample`.
37: Unknown or uninitialised column: `Sample`.
38: Unknown or uninitialised column: `Sample`.
39: Unknown or uninitialised column: `Sample`.
40: Unknown or uninitialised column: `Sample`.
41: Unknown or uninitialised column: `Sample`.
42: Unknown or uninitialised column: `Sample`.
43: Unknown or uninitialised column: `Sample`.
44: Unknown or uninitialised column: `Sample`.
45: Unknown or uninitialised column: `Sample`.
46: Unknown or uninitialised column: `Sample`.
47: Unknown or uninitialised column: `Sample`.
48: Unknown or uninitialised column: `Sample`.
49: Unknown or uninitialised column: `Sample`.
50: Unknown or uninitialised column: `Sample`.
>  FirstTestVector <- c(year(Date$FirstTest), month(Date$FirstTest))
Error in as.POSIXlt.default(x, tz = tz(x)) : 
  do not know how to convert 'x' to class “POSIXlt”
> Date$FirstTest
NULL
>  FirstTrainVector <- c(year(Date$FirstTrain), month(Date$FirstTrain))
Warning messages:
1: Unknown or uninitialised column: `Sample`. 
2: Unknown or uninitialised column: `Sample`. 
3: Unknown or uninitialised column: `Sample`. 
4: Unknown or uninitialised column: `Sample`. 
5: Unknown or uninitialised column: `Sample`. 
6: Unknown or uninitialised column: `Sample`. 
7: Unknown or uninitialised column: `Sample`. 
8: Unknown or uninitialised column: `Sample`. 
9: Unknown or uninitialised column: `Sample`. 
10: Unknown or uninitialised column: `Sample`. 
> FirstTestVector <- c(year(Date$LastTrain %m+% months(1)), month(Date$LastTrain %m+% months(1)))
> source('TI_model.R')
Warning messages:
1: Unknown or uninitialised column: `Sample`. 
2: Unknown or uninitialised column: `Sample`. 
3: Unknown or uninitialised column: `Sample`. 
4: Unknown or uninitialised column: `Sample`. 
5: Unknown or uninitialised column: `Sample`. 
6: Unknown or uninitialised column: `Sample`. 

Unfortunately, without a reproducible example there is little we can do to help.

I have also just started randomly getting this issue. Nothing major has changed in my project to prompt it. Unable to reproduce reliably.

RStudio 1.3.959, R 4.02 on macOS

I have occasionally been successful clearing the warnings through this procedure:

  1. From the column names in the warning message, try to guess which tibble in your environment is causing the warnings
  2. Delete the tibble with the rm() function
  3. Look in your source code (.R file that you are sourcing) for lines of code that modify the columns in the warning. Comment then uncomment these lines. I don't know the logic behind this, but I found this tip somewhere online. (Unfortunately, I can't find the source of this).

Good luck!

I don't have a reproducible example, but I do have some debugging information that might help @kevinushey figure out where the issue is coming from.

I have an R Markdown document where the first chunk is:

library(readr)
metadata <- read_csv("data/pannets_metadata.csv")
metadata

First, I terminate the R session to clear everything. When I run the chunk with Opt-Cmd-N (on macOS), I don't get a warning. When I then select the last line of the chunk (i.e. metadata) and run it with Cmd-Enter to preview the data frame, I get the following warnings. As you can see, there is no ATRX_Expr column in the metadata data frame. Although, I used to use ATRX_Expr as the name for ATRX_rnaseq, but that's no longer the case. This makes me think it's a caching issue.

> metadata
Warning messages:
1: Unknown or uninitialised column: 'ATRX_Expr'. 
2: Unknown or uninitialised column: 'ATRX_Expr'. 

> colnames(metadata)
 [1] "Tumour"        "Age"           "Sex"           "Metastasis"   
 [5] "Tumour_purity" "Immune_score"  "Subtype"       "DAXX_mutant"  
 [9] "MEN1_mutant"   "ATRX_mutant"   "ATRX_rnaseq"   "DAXX_rnaseq"  
[13] "MEN1_rnaseq"   "ATRX_array"    "DAXX_array"    "MEN1_array"

The above behaviour can be reproduced if I terminate R, clear all outputs, run that chunk with Opt-Cmd-N, and run metadata with Cmd-Enter. If I terminate R but don't clear all outputs first, I don't get the warnings. Strangely, if I terminate R and clear all outputs but run the chunk with Opt-Cmd-C (rather than Opt-Cmd-N), I don't get the warnings most of the time. I've tried a few times and each outcome is reproducible, even after I completely quit and re-open RStudio.

Sometimes, I have to click Cmd-Enter on metadata a few times before getting the warnings. Subsequent runs of metadata with Cmd-Enter don't cause the warnings.

It's worth noting that if I run the same three lines of code at the console directly within RStudio, I don't get any warnings regardless of how many times I re-run metadata to preview the data frame. This suggests that it's an R Markdown issue.

I tried deleting the .Rproj.user folder associated with the RStudio project in question. I deleted the folder while RStudio wasn't running. After re-opening RStudio, I could reproduce the warnings using the steps described above. I even tried deleting all of RStudio's preferences in ~/Library and ~/.config/, but the warnings persisted.

All of these tests were done with RStudio version 1.3 (not the latest version). I couldn't get the warnings to manifest themselves with version 1.2.5042. When I installed the latest version of RStudio (version 1.3.1056), I started getting the warnings again. The issues seems specific to RStudio version 1.3.

I believe this issue may be fixed in the latest daily builds of RStudio. If you get a chance, would you be able to check and verify if this does indeed appear to be the case?

I ran into this problem rather extensively just now. I installed an updated version of RStudio, but that didn't help. My problem is reproducible, but I can't post a nice short example since the script brings in all sorts of files. However, I did find a fix: At one point, I read in an Excel file using 'read_excel' from 'library(readxl)':

peakdf <- read_excel(path = fl)

If I change that to the following, the problem (which doesn't in any obvious way relate to my 'peakdf') goes away:

peakdf <- as.data.frame(read_excel(path = fl),
stringsAsFactors = FALSE)

So perhaps there is some sort of tibble-related problem?

I hope this ends up being helpful in tracking down the problem.

Eric

1 Like

And I after pivot-longer. The message was "Unknown or uninitialised column: Country." There is no such column in the script I am working onm. In the past have used tables with a column called that. Restarting RStudio with Shift/F10 squashed it

I found that I didn't ungroup() after grouping and once I added that at the end of a piped statement, the problem went away.

2 Likes

Thanks, as.data.frame( read_excel( ... ) ) fixed this issue for me.

1 Like