mainuplating data frame

Hi everyone, just one question that took my time so long.

I have a data frame in the code below and would like to keep only unduplicated rows based on column char. However, I do not want to lose the information in the column x and y as I remove duplicates.

 L3 <- LETTERS[1:3]
char <- sample(L3, 10, replace = TRUE)
print( data.frame(x = rep(c("w","z","x","g","h"), c(2,2,2,3,1)), y = 1:10, char = char))

Is there any possible way to restructure the data frame while removing duplicates and keeping column information corresponding to duplicate items?
I want to get a data frame as below

      char   x            y
1   A      x,g         5,6
2   B     w,z,x,g   1,2,3,4,5,6,7,8,9
3   C     z,n          3,4

Best,
Amare

A tidyverse solution would be:

library(tidyverse)

L3 <- LETTERS[1:3]
char <- sample(L3, 10, replace = TRUE)
df <- data.frame(x = rep(c("w","z","x","g","h"), c(2,2,2,3,1)), y = 1:10, char = char)
df |>
  group_by(char) |>
  summarise(across(c(x, y), ~ str_flatten(unique(.x), collapse = ", ")))
#> # A tibble: 3 × 3
#>   char  x       y         
#>   <chr> <chr>   <chr>     
#> 1 A     w, z, g 2, 4, 7, 8
#> 2 B     w, g, h 1, 9, 10  
#> 3 C     z, x    3, 5, 6

Created on 2022-08-01 by the reprex package (v2.0.1)

1 Like

Consider nesting your data as an alternative

(mydata <-data.frame(
  stringsAsFactors = FALSE,
  x = c("w", "w", "z", "z", "x", "x", "g", "g", "g", "h"),
  y = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L),
  char = c("C", "A", "C", "A", "A", "B", "C", "C", "B", "B")
))

library(tidyverse)

(mydata_nested <- nest(mydata,data=c(x,y)))

mydata_nested$data

This approach is now more convenient than ever as there is a new nplyr package on CRAN.
A Grammar of Nested Data Manipulation • nplyr (markjrieke.github.io)

1 Like

I have have misunderstood but I find my function SortedUniqueList useful in situations like this

SortedUniqueList <- function(vectorin, sep = "/") {
  paste(unique(sort(vectorin, na.last = TRUE)),collapse=sep)
}

outdata <- mydata %>%
  group_by(char) %>%
  summarise(x = SortedUniqueList(x, sep = ","), y = SortedUniqueList(y, sep = ","))