I have tidyr code to get me a table as shown below.on the left. I am trying to get to output on the right which can split/separate columns and use first column from the pair as column name for the 2nd column.

# Tidyr separate paired multiple columns and use first column value as column name

Hi!

To help us help you, could you please prepare a **repr**oducible **ex**ample (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

I dont really have any code for it yet.

But this is what I would think of starting with...

```
library(tidyr)
dt <- data.frame(Group = c(rep("ABC1", 10),rep("ABC2", 10)),
SubGroup = c(rep("XYZ1", 5), rep("XYZ2", 5), rep("XYZ1", 5), rep("XYZ2", 5)),
A1 = c(rep("LMN_PQR1", 5), rep("LMN_PQR2", 10), rep("LMN_PQR1", 5)),
A2 = c(rep("TUV_WXY2", 5), rep("TUV_WXY4", 5), rep("TUV_JKL3", 10)))
dt %>%
separate(A1, c("Col1", "Col2"), sep = "_", extra = "merge") %>%
separate(A2, c("Col3", "Col4"), sep = "_", extra = "merge")
```

The columns to be separated are variable and can range from several 10s to 100s. The first two are fixed.

So I am looking for a generalized loop that can take dt[,-1:-2] as input, separate them and for each separation make unique value from first column as header for 2nd col.

I'm sure there is a more elegant way but in the mean while this works

```
library(tidyverse)
dt <- data.frame(Group = c(rep("ABC1", 10),rep("ABC2", 10)),
SubGroup = c(rep("XYZ1", 5), rep("XYZ2", 5), rep("XYZ1", 5), rep("XYZ2", 5)),
A1 = c(rep("LMN_PQR1", 5), rep("LMN_PQR2", 10), rep("LMN_PQR1", 5)),
A2 = c(rep("TUV_WXY2", 5), rep("TUV_WXY4", 5), rep("TUV_JKL3", 10)))
names <- dt %>%
select(-Group, -SubGroup) %>%
head(1) %>%
unlist() %>%
str_extract(".+(?=_)")
dt %>%
transmute_at(vars(-Group, -SubGroup), ~str_extract(., "(?<=_).+"), ) %>%
rename_all(~names) %>%
bind_cols(dt[,1:2]) %>%
select(Group, SubGroup, everything())
#> Group SubGroup LMN TUV
#> 1 ABC1 XYZ1 PQR1 WXY2
#> 2 ABC1 XYZ1 PQR1 WXY2
#> 3 ABC1 XYZ1 PQR1 WXY2
#> 4 ABC1 XYZ1 PQR1 WXY2
#> 5 ABC1 XYZ1 PQR1 WXY2
#> 6 ABC1 XYZ2 PQR2 WXY4
#> 7 ABC1 XYZ2 PQR2 WXY4
#> 8 ABC1 XYZ2 PQR2 WXY4
#> 9 ABC1 XYZ2 PQR2 WXY4
#> 10 ABC1 XYZ2 PQR2 WXY4
#> 11 ABC2 XYZ1 PQR2 JKL3
#> 12 ABC2 XYZ1 PQR2 JKL3
#> 13 ABC2 XYZ1 PQR2 JKL3
#> 14 ABC2 XYZ1 PQR2 JKL3
#> 15 ABC2 XYZ1 PQR2 JKL3
#> 16 ABC2 XYZ2 PQR1 JKL3
#> 17 ABC2 XYZ2 PQR1 JKL3
#> 18 ABC2 XYZ2 PQR1 JKL3
#> 19 ABC2 XYZ2 PQR1 JKL3
#> 20 ABC2 XYZ2 PQR1 JKL3
```

^{Created on 2019-08-30 by the reprex package (v0.3.0.9000)}

Here's a way you can do that:

The script is more wordy than it needs to be to make it easier to follow.

Since I qualified the *tidyverse* functions you may not need to load them, or you can replace the 4 library calls with a single *library(tidyverse)* to load all of them at once.

```
library(tidyr)
library(dplyr)
library(stringr)
library(purr)
```

```
dt <- data.frame(
Group = c(rep("ABC1", 10), rep("ABC2", 10)),
SubGroup = c(rep("XYZ1", 5), rep("XYZ2", 5), rep("XYZ1", 5), rep("XYZ2", 5)),
A1 = c(rep("LMN_PQR1", 5), rep("LMN_PQR2", 10), rep("LMN_PQR1", 5)),
A2 = c(rep("TUV_WXY2", 5), rep("TUV_WXY4", 5), rep("TUV_JKL3", 10)),
stringsAsFactors = FALSE
)
```

- Define a function that will take a column name from a data.frame, get the content of the first row, split by "_", take the first character vector result as new column name, and then proceed to
*separate*the column (default is splitting by non_alpha character, as such it will split at the "_").*NA*for the first column name will skip creating the column. [If you're in the same environment, you may be able to remove the second parameter,*dframe*and use*dt*directly. In that case also lose the corresponding argument to*map_dfc*.]

```
split_column <- function(col, dframe) {
row1 <- dframe[1, col]
new_col_name <- stringr::str_split_fixed(row1, "_", n = 2)[, 1]
new_col <- dframe %>%
dplyr::select(col) %>%
tidyr::separate(col, c(NA, new_col_name))
}
```

- Build a list of column names without the first 2:

`cols <- colnames(dt)[-1:-2]`

- Then run the
*split_column*function over all remaining columns with*map_dfc*, which returns a data frame with the combined columns:

`new_cols <- purrr::map_dfc(cols, split_column, dt)`

- Add in the original first 2 columns to create the modified dt:

`new_dt <- dplyr::bind_cols(dt[1:2], new_cols)`

And here's the result:

```
Group SubGroup LMN TUV
1 ABC1 XYZ1 PQR1 WXY2
2 ABC1 XYZ1 PQR1 WXY2
3 ABC1 XYZ1 PQR1 WXY2
4 ABC1 XYZ1 PQR1 WXY2
5 ABC1 XYZ1 PQR1 WXY2
6 ABC1 XYZ2 PQR2 WXY4
7 ABC1 XYZ2 PQR2 WXY4
8 ABC1 XYZ2 PQR2 WXY4
9 ABC1 XYZ2 PQR2 WXY4
10 ABC1 XYZ2 PQR2 WXY4
11 ABC2 XYZ1 PQR2 JKL3
12 ABC2 XYZ1 PQR2 JKL3
13 ABC2 XYZ1 PQR2 JKL3
14 ABC2 XYZ1 PQR2 JKL3
15 ABC2 XYZ1 PQR2 JKL3
16 ABC2 XYZ2 PQR1 JKL3
17 ABC2 XYZ2 PQR1 JKL3
18 ABC2 XYZ2 PQR1 JKL3
19 ABC2 XYZ2 PQR1 JKL3
20 ABC2 XYZ2 PQR1 JKL3
```

Voilà

Here's the script that I suggest that use rename_at function:

```
library(tidyverse)
dt <- data.frame(Group = c(rep("ABC1", 10),rep("ABC2", 10)),
SubGroup = c(rep("XYZ1", 5), rep("XYZ2", 5), rep("XYZ1", 5), rep("XYZ2", 5)),
A1 = c(rep("LMN_PQR1", 5), rep("LMN_PQR2", 10), rep("LMN_PQR1", 5)),
A2 = c(rep("TUV_WXY2", 5), rep("TUV_WXY4", 5), rep("TUV_JKL3", 10)))
dt %>%
rename_at(vars(-Group, -SubGroup), funs(map_chr(~str_split(.x[1], "_")[[1]][1]), .args = list(.x = select(., -Group, -SubGroup)))) %>%
mutate_at(vars(-Group, -SubGroup), ~str_split(., "_", simplify = TRUE)[, 2])
```

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.