Bug report: unknown column warning when using tibbles

I found the exact problem posted by Balázs Szappanos on July 12, 2017 07:40, but I cannot find a response. The link to his post is https://support.rstudio.com/hc/en-us/community/posts/115007064927-Bug-report-unknown-column-warning-when-using-tibbles

His summary: Here is an annoying bug when tibbles are used in a script in RStudio. RStudio gives a lot of "Warning: Unknown or uninitialised column..."; warnings when I use tibbles and I save, modify or do anything else with my code. Not gonna lie, it is quite annoying.

Here is my code to illustrate the problem:

# Create a data frame first ####
# You won't get errors
options(warn = 1) # makes the warnings appear immediately
c1 <- 1:10
c2 <- letters[1:10]
df1 <- as.data.frame(cbind(c1,c2))
# Add columns but not in a for loop
df1$c3 <- 5:14
df1$c4 <- letters[5:14]

# Add column in a for loop but do not allocate memory
for (i in 1:nrow(df1)) {
  df1$c5[i] <- LETTERS[i]
}
# Result, no error

# Add column in a for loop but allocate memory
df1$c6 <- NA
for (i in 1:nrow(df1)) {
  df1$c6[i] <- LETTERS[i+3]
}
# Result, no error

# Create the same but make it a tibble
# You will get the error
library(tibble)
options(warn = 1) # makes the warnings appear immediately
tb1 <- as.tibble(cbind(c1,c2))

tb1$c3 <- 5:14
tb1$c4 <- letters[5:14]

# Add column in a for loop but do not allocate memory
for (i in 1:nrow(tb1)) {
  tb1$c5[i] <- LETTERS[i]
}
# Result - Warning: Unknown or uninitialised column: 'c5'.

# Add column in a for loop but allocate memory
tb1$c6 <- NA

for (i in 1:nrow(tb1)) {
  tb1$c6[i] <- LETTERS[i+3]
}
# Result no error, 
# however in more complicated code, 
# I often get the same warning as when I added c5 to the tibble
# even if I have allocated memory space as with c6
# This bug is not always around. Yesterday Monday, August 6, 2018
# I ran code all day without the anoying warnings.
# Today, Tuesday, August 7, 2018, the very same code
# gives all kinds of anoying warnings. I will sometimes get 4 warnings
# on simple functions like dir(pattern = ".csv")
# The problem does not appear to be predictable, more anoying!!!
5 Likes

I have seen this a lot in front of clients and it is horrible to try and tell them to "ignore it, it is a spurious warning."

Great job sharing a clean reproducible example. I just ran it it and it mis-behaves exactly as the comments describe.

I guess you could file it as a tibble issue.

I've filed this at https://github.com/tidyverse/tibble/issues/450 -- thanks for the report.

2 Likes

One thing worth asking -- what version of RStudio are you using? I can reproduce some of these warn-on-save issues with RStudio v1.0.153, but not with RStudio v1.1.453.

Sorry, I should have specified. I am using R version 3.5.1 with RStudio version 1.1.456. I just updated both last week. The issue was not as obvious before up-grading.

That's surprising to me as we actually attempted to work around warnings of this form in our update from RStudio v1.0 to RStudio v1.1!

Can you by any chance share some other code you're running that gives these warnings, alongside the exact error messages you're seeing? Do you only see the warnings on save, or do they seem to occur during other times / after other actions in the IDE?

The data that I am using is company private information so I cannot share the exact code. I found discussions that suggested applying the as.data.frame() function to the data frame to eliminate the reference to tibble. Got this idea from this discussion: https://stackoverflow.com/questions/39041115/fixing-a-multiple-warning-unknown-column

Applying as.data.frame() seems to have eliminated the error. That is the data I am using today and I have not had any problems. The class() of all my data is now "data.frame"

I did not get warnings on save. I got warning when doing some simple tasks. Some warnings when adding new fields. I even got a warning with dir(pattern = ".csv"). That one blew my mind. But don't act on this information until I can reproduce it.

When I get a good break point in my work, I will go back to the script where I had the problem and see if I can create a generic example.

There may be multiple reasons for the annoying warnings described above. On Nov. 2, 2018, I ran into the problem of getting the warning message that columns don't exist or that they are not initialized. The warning messages were caused by the functions names(df) and str(df) as I was trying to examine df.

After messing around with many suggestions above as well as others and without success, I restarted RStudio for the nth time and when back to the start of my code to see when the warnings might pop up. It is a long story and a long process of checking before I had an epiphany. I knew that the columns in the warnings were not created yet but in code below where I was working. After removing all of the code that created the warnings, the warnings went away. I pasted the code back in and the warnings came back. My problem seems to be related to RStudio.

Relevant dataframes in RStudio environment are all_matched_source and j2clm_clp_blh

To try to clarify what is going on, I decided to paste back into the code one line at a time. All the code pasted back in is below two lines of code:

names(all_matched_source)
str(all_matched_source)

First line pasted back in: j2clm_clp_blh$prin_amt <- NA, names(all_matched_source) no problem, and str(all_matched_source) no problem. Second code pasted back:

for (i in 1:nrow(j2clm_clp_blh)){
  if (j2clm_clp_blh$mode__x[i] == "1"){
    j2clm_clp_blh$prin_amt[i] <- 0
  }
  if (j2clm_clp_blh$mode__x[i] %in% c("2", "4", "5")){
    if (j2clm_clp_blh$ln_bal[i] < j2clm_clp_blh$payment[i]){
      j2clm_clp_blh$prin_amt[i] <- j2clm_clp_blh$ln_bal[i]
    } else {
      j2clm_clp_blh$prin_amt[i] <- j2clm_clp_blh$payment[i]
    }
  }
  if (((j2clm_clp_blh$mode__x[i] == "6") | (!is.na(j2clm_clp_blh$l_cr_score_appl_cd[i])))
      & ((!is.na(j2clm_clp_blh$l_cr_score_appl_cd[i]))
      & ((j2clm_clp_blh$ln[i] != j2clm_clp_blh$l_cr_score_appl_cd[i])))
      ) {
    j2clm_clp_blh$prin_amt[i] <- j2clm_clp_blh$ln_bal[i]
  }
}

Now names(all_matched_source) lists all of the names and warnings:

Warning messages:
1: Unknown or uninitialised column: 'prin_amt'. 
2: Unknown or uninitialised column: 'prin_amt'. 
3: Unknown or uninitialised column: 'prin_amt'. 
4: Unknown or uninitialised column: 'prin_amt'.

str(all_matched_source) does its job and gives the same warnings. Now for the real confusion, going back to names(all_matched_source) and executing NO WARNINGS!!! So I execute str(all_matched_source) NO WARNINGS!!! What the heck????

Now I put my cursor at the very end of the code just pasted and hit Enter to add a new line below the code. Now go back up to names(all_matched_source) and execute. I get the warnings back. Now execute str(all_matched_source) no warnings. Go to the bottom of the code and add another blank line go back up to str(all_matched_source) and execute and you guessed it, I get the warnings again. By this time I am ROFL (rolling on the floor laughing).

I hope this helps someone someday.

1 Like