bind_cols coherences new names with ... and index number when mapped over, what causes this?

Hi All,

I have two lists of tibbles (lets call them df1 and df2) df1 is the original tibble, while df2 is an imputation of one column (lets call that y1 in df1 and y2) using recipes::step_knnimpute. As there is information in df1 that I need to keep, my preference would be to replace the same column in df1 with df2, or at least add the imputed column into df1. In the first case, there does not appear to be a shortcut for moving the imputed data into df1 after checking the imputation. So I have created df2, and checked the non-imputed values values match between y in df1 and df2. So I know by row the data matches, and when I bind df1 with the column of df2 like this bind_cols(list(df,df2[5])) eveything is fine. However, as these are lists of tibbles I need to map over them so I use the following code...

df_list <- pmap(list(df1, df2), function(first, second) {
 bind_cols(list(first, second[[1]]))
  })

After doing this y2 is bound to df1 but has its name changed to "...15".

My question is firstly how can I prevent the ...# being added in the cbind, and why does it do this when the names are differnt? And my second question is, is there a more efficient way of transferring the imputed data into the original data frame rather than creating df2 and binding the desired row?

df <- structure(list(Item = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("ITEM76222", "ITEM78454"), class = "factor"), 
    Promotion = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("0", "1"), class = "factor"), ds = structure(c(1546300800, 
    1546387200, 1546473600, 1546560000, 1546646400, 1546732800, 
    1546819200, 1546905600, 1546992000, 1547078400), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), rowname = c("1", "2", "3", "4", "5", "6", "7", 
    "8", "9", "10"), y = c(374L, NA, 447L, 403L, 409L, 554L, 
    409L, 469L, 556L, 585L)), row.names = c(NA, -10L), index_quo = ~ds, index_time_zone = "UTC", class = c("tbl_time", 
"tbl_df", "tbl", "data.frame"))
df2 <- structure(list(y2 = c(374L, 450L, 447L, 403L, 409L, 554L, 409L, 
469L, 556L, 585L)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

This is how I created the immuted data df2

immpute_anomalies <- function(ti) {
  ti %>%
    recipes::recipe(y ~ Item + Promotion + season + trend) %>%
    recipes::step_knnimpute(y) %>%
    recipes::prep() %>%
    recipes::juice() %>%
    dplyr::rename(y2 = "y") %>%
    dplyr::select(y2)
}

list_ts_imps <- purrr::map(list_ts_anom, immpute_anomalies)

Thank you for any aid in helping me to understand this, your efforts are truly appreciated.

why wrap them in a list and not work with the dataframes directly ?
bind_cols(df,df2[5])

That is just what I saw in the dplyr documentation, it doesn't appear to make a difference with regards to the renaming.

The answer is to use the augment function from broom:: in order to rejoin the old and new data frames.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.