tidyr::separate() to split multiple columns ending with the same pattern?

I have a data frame that looks like this:

x <- data.frame(A = c("a", "b", "c", "d", "e"), 
>               B_x = c("1,0", "1,0", "1,2", "2,2", "2,1"), 
>               C_x = c("0,1", "1,0", "1,2", "2,2", "2,0"),
>               D_x = c("1,1", "1,2", "2,0", "1,0", "0,1"))
> 
> x
>    A     B_x     C_x     D_x
> 1  a     1,0     0,1     1,1
> 2  b     1,0     1,0     1,2
> 3  c     1,2     1,2     2,0
> 4  d     2,2     2,2     1,0
> 5  e     2,1     2,0     0,1

I want to separate the columns ending with "_x" , so the resulting data frame will be:

>    A  B_x_y  B_x_z  C_x_y  C_x_z  D_x_y  D_x_z
> 1  a    1      0      0      1      1      1
> 2  b    1      0      1      0      1      2
> 3  c    1      2      1      2      2      0
> 4  d    2      2      2      2      1      0
> 5  e    2      1      2      0      0      1

I tried to use both tidyr::separate() and select() to specify the columns I would like to split

x %>% separate(select(ends_with("_x")), into = c("_y", "_z"), sep = ",")

But got the following error message:

Error in separate():
! Problem while evaluating select(ends_with(".x")).
Caused by error in UseMethod():
! no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"

I also tried separate_wider_delim() but not didn't work as well. Not sure if separate() is the correct way to go...

Any suggestions are appreciated! Thank you!

# let this object represent a set of data frames
# all of which have column names which may end in
# _SOME_SYMBOL
# renamed d to reserve x for other uses
d <- data.frame(A = c("a", "b", "c", "d", "e"), 
                B_x = c("1,0", "1,0", "1,2", "2,2", "2,1"), 
                C_x = c("0,1", "1,0", "1,2", "2,2", "2,0"),
                D_x = c("1,1", "1,2", "2,0", "1,0", "0,1"))

# returns a vector of colunmn index positions
find_cols <- function(d,y) grep(paste0("_",y),colnames(d))

# example of using it to subset
d[find_cols(d,"x")]
#>   B_x C_x D_x
#> 1 1,0 0,1 1,1
#> 2 1,0 1,0 1,2
#> 3 1,2 1,2 2,0
#> 4 2,2 2,2 1,0
#> 5 2,1 2,0 0,1


# R programming language composed from AI prompt
process_columns <- function(data_frame, column_indexes) {
  for (index in column_indexes) {
    # Split the character string in the column by ","
    split_values <- strsplit(as.character(data_frame[[index]]), split = ",")
    
    # Create two new columns with numeric variables
    data_frame[[paste0("Column_", index, "_A")]] <- as.numeric(sapply(split_values, "[[", 1))
    data_frame[[paste0("Column_", index, "_B")]] <- as.numeric(sapply(split_values, "[[", 2))
  }
  return(data_frame)
}

# Example usage
data_frame <- data.frame(A = c("1,0", "2,3", "4,5"), B = c("6,7", "8,9", "10,11"))
column_indexes <- c(1, 2)
new_data_frame <- process_columns(data_frame, column_indexes)
new_data_frame
#>     A     B Column_1_A Column_1_B Column_2_A Column_2_B
#> 1 1,0   6,7          1          0          6          7
#> 2 2,3   8,9          2          3          8          9
#> 3 4,5 10,11          4          5         10         11

# resume human intervention
# application to OP data frame
d <- process_columns(d,find_cols(d,"x"))[-c(2:4)]

# manually fix column names

new_cols <- c("A",
              paste0("B_",letters[24:25]),
              paste0("C_",letters[24:25]),
              paste0("D_",letters[24:25]))
colnames(d) <- new_cols
d
#>   A B_x B_y C_x C_y D_x D_y
#> 1 a   1   0   0   1   1   1
#> 2 b   1   0   1   0   1   2
#> 3 c   1   2   1   2   2   0
#> 4 d   2   2   2   2   1   0
#> 5 e   2   1   2   0   0   1

Created on 2023-06-06 with reprex v2.0.2

Explanation for prompt and motivation.

1 Like
library(tidyverse)
x <- data.frame(
  A = c("a", "b", "c", "d", "e"),
  B_x = c("1,0", "1,0", "1,2", "2,2", "2,1"),
  C_x = c("0,1", "1,0", "1,2", "2,2", "2,0"),
  D_x = c("1,1", "1,2", "2,0", "1,0", "0,1")
)


tidyr::separate_wider_delim(
  data = x,
  cols = -A,
  delim = ",",
  names_sep = "_"
) |>
  rename_with(
    .fn = \(x){
      str_replace_all(
        string = x,
        pattern = fixed("_1"),
        replacement = "_y"
      ) |>
        str_replace_all(
          pattern = fixed("_2"),
          replacement = "_z"
        )
    },
    .cols = -A
  )
3 Likes

Is this the first Forum post with an AI-assisted answer? Start of a new era!

1 Like

offtopic - I did tickle the forum with chatgpt back in december ... Create m Fibonacci numbers - #3 by nirgrahamuk :sweat_smile:

2 Likes

This can be accomplished using the separate() function which turns a single character column into multiple columns.

watch stories
cc generator

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.