Can not cut paste data after repeat values of one column.

Hi,
I have data like below:

structure(list(X1 = c(2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 
2L, 3L, 4L, 5L, 6L), X10 = c(20L, 30L, 40L, 50L, 60L, 70L, 80L, 
90L, 100L, 110L, 120L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 
90L, 100L, 110L, 120L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 
90L, 100L, 110L, 120L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 
90L, 100L, 110L, 120L)), class = "data.frame", row.names = c(NA, 
-47L))

I want them to separate and bring in the next column based on when the pattern is being repeated in the first column, arranged data has to look something like below:

structure(list(X1.10.1.70.1.10.1.70.1.10.1.70 = structure(1:5, .Label = c("2 20 2 80 2 20 2 80 2 20 2 80", 
"3 30 3 90 3 30 3 90 3 30 3 90", "4 40 4 100 4 40 4 100 4 40 4 100", 
"5 50 5 110 5 50 5 110 5 50 5 110", "6 60 6 120 6 60 6 120 6 60 6 120"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

I have been trying with awk, but problems keep on recurring. Need help.

Hi @orijitghosh, could you post your data so it's easier for folks for copy and paste? For example, if your data is in an object called data, then you could run dput(data), and then paste the output between a pair of triple backticks (```), like this:

```
<--- paste output of dput(data) here
```
2 Likes

Done that and edited the question accordingly.

Thanks, @orijitghosh -- could you change the quotes (") around the structure commands to triple backticks? The backtick key is usually just below the 'esc' key in the upper left of the keyboard, and you need three of them -- as in my previous post -- in order to for the code to be properly quoted.

I'm not sure how representative the data you posted is, but if you know that the structure of the first column of your original table has a repeating structure, then you could you group every six rows, and then use pivot_wider() to do something similar to what you want -- do you expect your data to always have that repeating structure?

Hi @dromano, someone already edited in the format you prescribed, sorry for me not doing that earlier. Coming to my problem, NO, the repeating elements won't always be 6 in number. But it will always start with "1". So, whenever the number "1" comes, it should cut and paste both columns to the next one. Hope it's clear now.

Thanks for clarifying @orijitghosh. And that raises another question, since you may end up with columns of different lengths: If you have one column with six values to store, and one column with sixteen, what would you want to do with the extra ten spaces in the first column?

Here's code that works for the simpler case you presented, but not for your general case, although a tweak might make it work.

library(tidyverse)
df <- 
  structure(list(col1 = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 
                          5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 
                          3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 
                          1L, 2L, 3L, 4L, 5L, 6L), col2 = c(10L, 20L, 30L, 40L, 50L, 60L, 
                                                            70L, 80L, 90L, 100L, 110L, 120L, 10L, 20L, 30L, 40L, 50L, 60L, 
                                                            70L, 80L, 90L, 100L, 110L, 120L, 10L, 20L, 30L, 40L, 50L, 60L, 
                                                            70L, 80L, 90L, 100L, 110L, 120L, 10L, 20L, 30L, 40L, 50L, 60L, 
                                                            70L, 80L, 90L, 100L, 110L, 120L)), row.names = c(NA, -48L), class = c("tbl_df", 
                                                                                                                                  "tbl", "data.frame"))
#### end of 'structure()' call

# inspect df
df
#> # A tibble: 48 x 2
#>     col1  col2
#>    <int> <int>
#>  1     1    10
#>  2     2    20
#>  3     3    30
#>  4     4    40
#>  5     5    50
#>  6     6    60
#>  7     1    70
#>  8     2    80
#>  9     3    90
#> 10     4   100
#> # … with 38 more rows

# extract groups and move to separate columns
df %>% 
  # add rows numbers so can create groups
  mutate(row = row_number()) %>% 
  # define group by largest multiple of six, then remove row numbers
  mutate( group = ceiling(row / 6)) %>% select(-row) %>% 
  # make groups into column names, and move data to those new columns
  pivot_wider(names_from = group, values_from = col2)
#> # A tibble: 6 x 9
#>    col1   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`
#>   <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1     1    10    70    10    70    10    70    10    70
#> 2     2    20    80    20    80    20    80    20    80
#> 3     3    30    90    30    90    30    90    30    90
#> 4     4    40   100    40   100    40   100    40   100
#> 5     5    50   110    50   110    50   110    50   110
#> 6     6    60   120    60   120    60   120    60   120

Created on 2020-03-17 by the reprex package (v0.3.0)

(Note this code is in the form of a 'reproducible example', which means it includes all the code and data necessary for anyone to run it.)

Thank you @dromano, this kind of solves my problem. Coming to your question, if I have one column with six values and one with sixteen, I will keep the extra ten rows in the first column as NA or blank.

Did you still need the general solution?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.