Transpose a dataframe when rows contain two same variables (with double output)

eugenio.alladio · February 20, 2020, 1:47pm

I have a dataframe that contains a variable named "Marker" that present two values for each of the samples tested.
The dataframe (as an example) is, as follows:

Sample.File Sample.Name Marker value
1            a         a_1    xxx    16
2            a         a_1    xxx    18
3            a         a_1    yyy    16
4            a         a_1    yyy    20
5            a         a_1    zzz     9
6            a         a_1    zzz    13
7            b         b_1    xxx    10
8            b         b_1    xxx    10
9            b         b_1    yyy     6
10           b         b_1    yyy    12
11           b         b_1    zzz    14
12           b         b_1    zzz    14

provided by the following code:

data <- data.frame(
   Sample.File = as.factor(c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
                             "b", "b")),
   Sample.Name = as.factor(c("a_1", "a_1", "a_1", "a_1", "a_1", "a_1", "b_1",
                             "b_1", "b_1", "b_1", "b_1", "b_1")),
        Marker = as.factor(c("xxx", "xxx", "yyy", "yyy", "zzz", "zzz", "xxx",
                             "xxx", "yyy", "yyy", "zzz", "zzz")),
   value = c(16L, 18L, 16L, 20L, 9L, 13L, 10L, 10L, 6L, 12L, 14L, 14L)
)

I'd like to transpose my dataframe maintening the columns Sample.File and Sample.Name for all the collected samples, and obtaining new variables (e.g. xxx & xxx.1, yyy & yyy.1, zzz & zzz.1) for the column labelled as "value".

The table I'd like to achieve looks like the following:

  Sample.File Sample.Name xxx xxx.1 yyy yyy.1 zzz zzz.1
1           a         a_1  16    18  16    20   9    13
2           b         b_1  10    10   6    12  14    14

I'd like to use a code without writing the name of the labels reported into "Marker" column (since I could obtain up to 100 different labels).

I tried to use the following code but I couldn't achieve my goal:

library(dplyr)
library(tidyr)
data %>%
  gather(Sample.File, Sample.Name) %>%
  spread(value)

since I obtained the following error:

Error: `var` must evaluate to a single number or a column name, not a double vector
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
attributes are not identical across measure variables;
they will be dropped

I'd be very grateful if anybody could attend to this matter!

nirgrahamuk · February 20, 2020, 2:40pm

my solution would require at least version 1.0.0 as it relies on pivot_* functions which have become preffered over gather/spread

library(tidyverse)
packageVersion("tidyr")  # to verify your version

data <- data.frame(
  Sample.File = as.factor(c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
                            "b", "b")),
  Sample.Name = as.factor(c("a_1", "a_1", "a_1", "a_1", "a_1", "a_1", "b_1",
                            "b_1", "b_1", "b_1", "b_1", "b_1")),
  Marker = as.factor(c("xxx", "xxx", "yyy", "yyy", "zzz", "zzz", "xxx",
                       "xxx", "yyy", "yyy", "zzz", "zzz")),
  value = c(16L, 18L, 16L, 20L, 9L, 13L, 10L, 10L, 6L, 12L, 14L, 14L)
)

data2 <- group_by(data,
                  Sample.File,
                  Sample.Name,
                  Marker) %>%
  mutate(rn=row_number() -1 ,
         rn2 = ifelse(rn == 0,"",as.character(rn)),
         truemarker=paste0(Marker,rn2)) %>% 
  ungroup %>% 
  select(-Marker,-rn,-rn2)

data2 %>% pivot_wider(names_from = truemarker,
                      values_from = value)

eugenio.alladio · February 20, 2020, 2:46pm

Dear @nirgrahamuk, thank you for your kind reply.
I tried to run your example (and my packageVersion is the following):

packageVersion("tidyr") 
[1] ‘1.0.0’

but I keep on obtaining the following error:

Error: row_number() should only be called in a data context
Run `rlang::last_error()` to see where the error occurred.

Could you please help me with this?

nirgrahamuk · February 20, 2020, 2:53pm

that's strange, as I my example runs for me in a fresh session no issues. maybe restart your R session if you havent already ?

anyway, usually when you get an error and are encouraged to try rlang::last_error() in R Studio at least there is often a blue arrow marking to the right of that text, that if you click would expand to give you a traceback

eugenio.alladio · February 20, 2020, 2:59pm

Dear @nirgrahamuk, I performed conflicts() and actually now it works. Thank you for your help.
However, I'd like to see the variable names to be written as follows:

xxx xxx.1 yyy yyy.1 zzz zzz.1

instead of

xxx xxx1 yyy yyy1 zzz zzz.1

would it be possible?

nirgrahamuk · February 20, 2020, 3:03pm

if you wish to add full stop symbol then the line

truemarker=paste0(Marker,rn2)) %>%

should change to

truemarker=paste(Marker,rn2,sep=".")) %>%

eugenio.alladio · February 20, 2020, 3:09pm

Dear @nirgrahamuk, thank you very much again, but now I obtain

`

xxx. xxx.1 yyy. yyy.1 zzz. zzz.1

`

while I'd like to see

xxx xxx1 yyy yyy1 zzz zzz.1

I'm sorry to bother you with this

nirgrahamuk · February 20, 2020, 3:11pm

so, you want anything with z in the marker name to be seperated with a dot, when combining with a number, but for this not to happen for markers without z ?

eugenio.alladio · February 20, 2020, 3:13pm

No, I'd like to see the plain variable name for the first columns (e.g. xxx, yyy, zzz), while I'd like to see the name.1 for the second columns (e.g. xxx.1, yyy.1, zzz.1)

nirgrahamuk · February 20, 2020, 3:16pm

data <- data.frame(
  Sample.File = as.factor(c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
                            "b", "b")),
  Sample.Name = as.factor(c("a_1", "a_1", "a_1", "a_1", "a_1", "a_1", "b_1",
                            "b_1", "b_1", "b_1", "b_1", "b_1")),
  Marker = as.factor(c("xxx", "xxx", "yyy", "yyy", "zzz", "zzz", "xxx",
                       "xxx", "yyy", "yyy", "zzz", "zzz")),
  value = c(16L, 18L, 16L, 20L, 9L, 13L, 10L, 10L, 6L, 12L, 14L, 14L)
)

data2 <- group_by(data,
                  Sample.File,
                  Sample.Name,
                  Marker) %>%
  mutate(rn=row_number() -1 ,
         rn2 = ifelse(rn == 0,"",paste0(".",as.character(rn))),
         truemarker=paste0(Marker,rn2)) %>% 
  ungroup %>% 
  select(-Marker,-rn,-rn2)

data2 %>% pivot_wider(names_from = truemarker,
                      values_from = value)

# A tibble: 2 x 8
  Sample.File Sample.Name   xxx xxx.1   yyy yyy.1   zzz zzz.1
  <fct>       <fct>       <int> <int> <int> <int> <int> <int>
1 a           a_1            16    18    16    20     9    13
2 b           b_1            10    10     6    12    14    14

eugenio.alladio · February 20, 2020, 4:09pm

Dear @nirgrahamuk,
I'd like to sincerely thank you for your help, the code works actually as I needed!
Thank you so much!

system · February 27, 2020, 4:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.