Cannot add_column to empty tibble


#1

A column cannot be added to an empty tibble. Is this by design or maybe the use case doesn’t make sense?

As is building a tibble a column at a time requires the first column to be treated differently than the subsequent ones, i.e. a special step in the code to build the tibble.

I know it probably isn’t efficient to build a tibble a column at a time…

t1 <- tibble::tibble()
t2 <- tibble::add_column(t1, c1 = c(1,2,3))

Error: n > 0 is not TRUE

Thanks,
Dan


#2

From the documentation, you cannot add observations through add_colum, so as t1 has 0 observations, I think that’s why it doesn’t work.
And beware that it could fail silently if you provide only one value:

> t2 <- tibble::add_column(t1, c1 = 1)
> t2
# A tibble: 0 x 1
# ... with 1 variables: c1 <dbl>

t2 in this case is probably not what you want.

Can you start you tibble directly at the second step ?

> t1 <- tibble::tibble(c1 = c(1, 2, 3))
> t1
# A tibble: 3 x 1
     c1
  <dbl>
1     1
2     2
3     3

#3

Thanks Florian.

I know I could initialize the tibble with the first column, but then that column ends up being handled differently than the rest of the columns which could be added in a walk or loop.

I am doing an initialization like that now but that inconsistency makes the code more verbose and complicated.

BTW the “can’t add observations” in the doc’s means that you can’t add to an existing column, you can only add a brand new column with a name that is different than the other columns in the tibble.

Maybe I’m just being too picky but it seems like add_column should work with an empty tibble… but there may be some underlying reason to not allow that that I am missing.

Dan


#4

A tibble is a data.frame, and in a data.frame, all columns have the same number of observations. So, I guess that if you create a data frame with 0 rows, all the other columns must have 0 rows.

Maybe you can make a list and once you have all your data, you can covert it to a tibble with as_tibble ?


#5

You could create a fake one-column tibble and remove that column, though it would require knowing the number of rows prior to starting your tibble creation:

tibble(a = 1:10) %>% select()
# A tibble: 10 x 0

#6

Thanks, I appreciate the pointers, but what I am interested in is why a column cannot be added to an empty tibble.

It may be due to an oversight in the implementation or maybe it is deliberate and there is something about the use of tibble::add_column I don’t understand. Maybe the documentation needs to be updated (for completeness), there is nowhere it mentions that you cannot add_column to an empty tibble.

My example was an abstraction of what I am trying to do to keep the example as simple as possible. The actual task I’m trying to do is more involved than just adding a column to a tibble… maybe because I am looking at it the wrong way :slight_smile: … and for the moment building a tibble by adding one column at a time simplifies things.

But I’ve bumped into a function that does not seem to behave in the way that either it’s name or it’s documentation implies so I would like to find out if it is just an unimportant detail or something that needs to be fixed or documented.

Nick, I do know the number of rows that will be in the tibble before I create it.

FlorianGB . The error says that n > 0 is not true.

t1 <- tibble::tibble()
t2 <- tibble::add_column(t1, c1 = c(1,2,3))
#> Error: n > 0 is not TRUE

If I add_column was expecting 0 rows to be added I would expect something like

Error: .data must have 0 rows, not 3

and adding a column with no rows in it produces a different error.

t1 <- tibble::tibble()
t2 <- tibble::add_column(t1, c1 = c())
#> Error: Column c1 must be a 1d atomic vector or a list

Here is an expanded example of what I am trying to do, again abstracted to keep the code as simple as a possible.

tib_example1 does not work because it tries to add a column to an empty tibble

tib_example2 works as expected but it requires the tibble to be initialized with a dummy column.

tib_example1 <- function(l) {
e <- environment()
e$tib <- tibble::tibble()
purrr::walk(l, function(c, e) {
e$tib <- tibble::add_column(e$tib, c)
}, e)
return(e$tib)
}

tib_example2 <- function(l) {
e <- environment()
e$tib <- tibble::tibble(1:3)
purrr::walk(l, function(c, e) {
e$tib <- tibble::add_column(e$tib, c)
}, e)
return(e$tib)
}

tib_example1(list(4:6))
#> Error: n > 0 is not TRUE

tib_example2(list(4:6))
#> # A tibble: 3 x 2
#> 1:3 c
#>
#> 1 1 4
#> 2 2 5
#> 3 3 6

Thanks again,
Dan


#7

Hi,
I’m sorry I don’t get what you expect the function to do. I know that you have abstracted your real life problem. But in your expected output from tib_example2, the column is named ‘c’, which is never defined except inside your function as an argument.
Do you need to be inside the environment of the function each time? Because, as you are resetting the tibble each time, this doesn’t seem necessary to me.
Do you need to have a named list, so you can have a column name for your tibble? If so, maybe tibble::as_data_frame would be sufficient?

tibble::as_data_frame(list(c = 4:6))
# A tibble: 3 x 1
      c
  <int>
1     4
2     5
3     6

#8

Yes, resetting the tibble is probably inefficient, but for now building the tibble one column at a time is easier to do. That may or may not be an practical issue depending on the amount of data that will be put into the tibble. If it is a practical issue there are a number of ways to fix it. The app is not building tibble literally, it is building it dynamically, i.e. deriving the data, including the column names, from a number of input files, which almost always leads to time/space/complexity tradeoffs

You don’t actually need a named list here to make a tibble but it is almost always the best thing to do and in some cases necessary… but I left that out to simplify the example and it wasn’t part of the issue I was looking at.

In the actual code the column names are passed in as a variables which complicates things. There are a number of ways to handle that issue but again it would complicate the example.

In the end the app will build a list of tibbles from an arbitrary list of files (which can have different formats), with each file contributing some data to each tibble. It’s not a simple “rectangular file” so the simple ways of building a tibble won’t work.

Dan


#9

I’ve come to the conclusion that add_column is producing an undocumented and unexpected result when .data (the input tibble) has 0 columns.

Here is why I think this and a proposed fix…

I checked the source for add_column and it checks for a number of edge cases, for example where the number of added columns is 0, but the case where .data (i.e. the input tibble) has 0 columns is not checked.

This produces a the confusing error “Error: 0 is not TRUE”, which I would call a programming surprise because does not look to be intentional :slight_smile: .

This happens because later in the code pluralise_n is used with nrow(.data), i.e. pluralise_n is used with 0,.to make an error message. pluralise_n produces the low level message "“Error: 0 is not TRUE” when it is asked to pluralize 0.

However, after a bit of checking, setup, and fixup all that add_column does is concatenate the added columns with the ones already in the tibble. This make sense because a tibble stores it’s content as a sequence of columns.

So… it would make sense (IMHO of course) for add_column to check to see if the .data (the input tibble) contained 0 columns and if it did just return the input columns (i.e …) as a new tibble. I think that this kind of behavior would be more consistent with name and implied behavior of a function named add_column.

Supporting a 0 column .data could be added with a simple test at the beginning of the code.

#existing code
if (ncol(df) == 0L) {
return(.data)
}
#add this to support 0 columns .data
if(ncol(.data)==0L) {
return(tibble(…))
}

Dan


#10

As a counterpoint, preventing add_column from changing the number of rows on an empty tibble could be considered a feature. The error in the following related case is arguably a good thing:

suppressPackageStartupMessages(library(tidyverse))
tibble(a = 1:4) %>%
  select(-a) %>%
  add_column(b = 1:4)
#> # A tibble: 4 x 1
#>       b
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
tibble(a = 1:4) %>%
  select(-a) %>%
  add_column(b = 1:5)
#> Error: `.data` must have 4 rows, not 5

That said, it’s a reasonable argument that a tibble with zero rows should be a special case (which is how your fix treats it). At the very least, if it is behaving as intended, it makes sense to have the same error as for other cases where row numbers are mismatched.


#11

The code in the tibble package has been fixed so that add_column to an empty tibble produces an appropriate error that tells you that you cannot add a row to an empty tibble.

@krlmlr krlmlr fix off-by-one error, closes #319

  • add_column() to an empty zero-row tibble with a variable of nonzero length now produces a correct error message (#319).

This is the workaround I made so that I can add_column to an empty tibble. Of course now that I have this workaround I don’t need it anymore :frowning: but I’m putting it here in case anyone is interested.

hd_add_column <- function(data, ..., before = NULL, after = NULL) {
    if (nrow(data) == 0L) {
        return(tibble::tibble(...))
    }
    return(tibble::add_column(data, ..., before, after))
}