convert such tibble to data frame

tjcnnl1 · December 18, 2020, 2:17am

I need to change shape of a data frame from long to wide. After reshaping, it results a tibble [1 x 7], each column in tibble is a list. I want to change tibble to data frame. Using code below, only first 3 columns (lists) can be done, rest of them are failed because lengths of these columns (lists) are different. Is there a way to convert such tibble to data frame?

library(tidyverse)
test <- data.frame(
  stringsAsFactors = FALSE,
  cat = c("15918-OP-18d","15918-OP-18d",
          "15918-OP-18d","15918-OP-18d","15918-OP-18d",
          "15918-OP-18d","15918-OP-18d","15918-OP-18d","15918-OP-18d",
          "15918-OP-18d","15918-OP-18d","15918-OP-18d",
          "15918-OP-18d","15918-OP-18d","15918-OP-18d","15918-OP-18d",
          "15918-OP-18d","15918-OP-18d","15918-OP-18d",
          "15918-OP-18d","15923-OP-23","15923-OP-23","15923-OP-23",
          "15923-OP-23","15923-OP-23","15923-OP-23","15923-OP-23",
          "15923-OP-23","15923-OP-23","15923-OP-23","15923-OP-23",
          "15923-OP-23","15923-OP-23","15923-OP-23","15923-OP-23",
          "15923-OP-23","15923-OP-23","15923-OP-23",
          "15923-OP-23","15923-OP-23","15976-VE-6","15976-VE-6",
          "15976-VE-6","15976-VE-6","15976-VE-6","15976-VE-6",
          "15976-VE-6","15976-VE-6","15976-VE-6","15976-VE-6",
          "16055-PC-05","16055-PC-05","16055-PC-05","16055-PC-05",
          "16055-PC-05","16055-PC-05","16055-PC-05",
          "16055-PC-05","16055-PC-05","16055-PC-05","16055-PC-05",
          "16055-PC-05","16055-PC-05","16055-PC-05","16055-PC-05",
          "16055-PC-05","16055-PC-05","16055-PC-05","16055-PC-05",
          "16055-PC-05","14854-HB-5e","14854-HB-5e",
          "14854-HB-5e","14854-HB-5e","14854-HB-5e",
          "14854-HB-5e","14854-HB-5e","14854-HB-5e",
          "14854-HB-5e","14854-HB-5e","14854-HB-5e","14854-HB-5e",
          "14854-HB-5e","14854-HB-5e","14854-HB-5e",
          "16215-EE-1a","16215-EE-1a","16215-EE-1a","16215-EE-1a",
          "16215-EE-1a","16215-EE-1a","16215-EE-1a",
          "16215-EE-1a","16215-EE-1a","16215-EE-1a","16215-EE-1a",
          "16215-EE-1a","16215-EE-1a","16215-EE-1a","16215-EE-1a",
          "16580-eC-3","16580-eC-3","16580-eC-3","16580-eC-3",
          "16580-eC-3","16580-eC-3","16580-eC-3",
          "16580-eC-3","16580-eC-3","16580-eC-3","16580-eC-3",
          "16580-eC-3","16580-eC-3","16580-eC-3",
          "16580-eC-3"),
  score = c(250.667,408,226.118,192,
           364.833,371.167,290.6,398.857,437.6,467.6,423.8,
           402.875,264.667,302.833,137,251.265,299.421,370,578.5,
           343.833,0,59.1,100,92.3,40,100,0,50,87.5,38.5,
           84,84.2,66.7,40,85.7,50,70.6,100,100,83.3,0,0,
           9.1,0,7.1,12.5,7.7,11.1,0,16.7,69.8,57.3,55.6,
           50.9,42.2,87.4,84.9,31.8,73.6,69.7,75.3,80,66,
           53,66,70.7,69.3,61,45.5,22.3,0,50,50,0,80,0,
           75,100,67.7,66.7,50,100,50,0,100,710.902,
           352.768,266.309,375.199,352.045,346.25,387.726,298.459,
           252.964,288.243,395.552,329.736,279.374,394.579,
           322.825,100,0,50,40,57.8947368421053,93.3333333333333,
           66.6666666666667,21.0526315789474,100,69.2307692307692,
           95.6521739130435,16.6666666666667,76,
           79.4871794871795,100)
wkdat <- test %>% pivot_wider(names_from = cat, values_from = score)
str(wkdat)
datok = data.frame(op18 = wkdat[[1]],
                   op23 = wkdat[[2]],
                   ve6  = wkdat[[3]])
                  #  pc5=wkdat[[4]],
                  #  hb5e=wkdat[[5]],
                  # ee1a=wkdat[[6]],
                  # ec3=wkdat[[7]]
                  #  )

names(datok)[1] <- "op18"
names(datok)[2] <- "op23"
names(datok)[3] <- "ve6"
# names(datok)[4] <- "pc5"
# names(datok)[5] <- "hb5e"
# names(datok)[6] <- "ee1a"
# names(datok)[7] <- "ec3"

williaml · December 18, 2020, 2:30am

Have a look at this: converting efficiently between data.table, data.frame and tibble

nirgrahamuk · December 18, 2020, 11:19am

The point of dataframes is to hold related data.
objects share a row because they relate to the same thing.
like facts about a person, and each row is a person.
its not clear how your cats and scores relate to each other if they even do...
If you want to just smash them all together, with NA's hanging off the end, then one would construct that like so. but this assumes that the data is somewhat arbitrary as it does not respect any relationships....


(my_groups <- group_split(test,cat))

(largest_group_length <- max(purrr::map_int(my_groups,nrow)))

(extend_groups <- purrr::map(my_groups,~{
  rows_to_add <- largest_group_length - nrow(.x)
  if(rows_to_add>0){
    cat_to_use <- unique(.x$cat)
    added_rows <- data.frame(
      cat= cat_to_use,
      score = rep(NA,rows_to_add))
   return( bind_rows(.x,added_rows))
  }
  return(.x)
}))

#smash together
dplyr::bind_cols(extend_groups)

tjcnnl1 · December 18, 2020, 2:38pm

Thanks for your answer regarding “call a function multiple times”, and all I tried to do are preparing data to generate histogram for each category (cat). In order to use the way as described in “ Automating exploratory plots with ggplot2 and purrr" , I want to have expl named vector based on the test data frame, then use map(). I don’t know how to prepare such data to meet the requirement. The way of R handling loop seems very weird to me. Thank you very much

draw_hist <- function(x) {
   ggplot(test,aes(x=.data[[x]], y=..count..)) + 
    geom_histogram(bins=50, fill="steelblue", color="white") 
}
# don't know how to get expl from test???
expl = set_names(expl)
hist_all <- map(expl, ~ draw_hist (.x) )

tjcnnl1 · December 18, 2020, 2:43pm

I am from SAS and C world - a new R user. I have a big gap to understand each line of your code. It is very confusing with data frame, tibble, list kind of data type and the way of R handling loop is strange to me. Thank you very much for your code, I'll try it out.

system · January 8, 2021, 2:43pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.