Transform variable sorted within a group

Sorry for a slightly unhelpful title, my question is a little difficult to explain. I have some data with the following general format:

y <- c(0,10,10,4,8,9,2,10,7,1,3,1)
node <- c(rep(1,6), rep(2,6))

d <- data.frame(
  X = X,
  Y = Y,
  node = node
)

Where Y is the "outcome", node is a grouping variable and X is a timecode. What I need to do is, for each Node individually, transform the 6 corresponding values in the X column to 1 - 6 where 1 is assigned the smallest number in X and 6 to the largest number in X. Basically, label them 1-6 in ascending order.

To be clear, simply assigning 1 - 12 based on the order of the X column and ignoring the node column would not achieve what I need it to. Likewise, its vital that the pairings between node X and Y remain in tact after transformation.

Many thanks in advance.

Hello. Your reprex is incomplete. Can you please provide sample values for the X vector?

Also, your description is somewhat confusing. What role does the Y vector play in this transformation? From your explanation, it doesn't appear to be used anywhere.

Hey, thanks for your reply. Firstly, apologies, I missed the x value on my first copy and also had X and Y capatalised when they shouldn't be. Please find an updated reprex below:

x <- c(660,66,188,67,886,906,628,818,351,905,97,661)
y <- c(0,10,10,4,8,9,2,10,7,1,3,1)
node <- c(rep(1,6), rep(2,6))

d <- data.frame(
  X = x,
  Y = y,
  node = node
)

In regards to the Y variable, you're right it isn't involved in the transformation however it is essentially my variable of interest. I could provide a little more context if you would like, but the basic gist is Y represents some outcome measured on an experiment, node represents different participants and X the time that they made their decision in the experiment. So, all I need to do is label them from 1 - 6 per individual so that I can say "Individual 1 choose 6 on their first round and 3 on their third round" etc.

Of course I have simplified this significantly for my reprex so I could avoid having people go through my entire dataset.

I hope this clears things up?

I put the new label in a new column in case the x values would be of use later. You could just overwrite the x column.

library(dplyr)
x <- c(660,66,188,67,886,906,628,818,351,905,97,661)
y <- c(0,10,10,4,8,9,2,10,7,1,3,1)
node <- c(rep(1,6), rep(2,6))

d <- data.frame(
  X = x,
  Y = y,
  node = node
)
d2 <- d %>% group_by(node) %>% 
  arrange(node, X) %>% 
  mutate(xNew = row_number())
d2
#> # A tibble: 12 x 4
#> # Groups:   node [2]
#>        X     Y  node  xNew
#>    <dbl> <dbl> <dbl> <int>
#>  1    66    10     1     1
#>  2    67     4     1     2
#>  3   188    10     1     3
#>  4   660     0     1     4
#>  5   886     8     1     5
#>  6   906     9     1     6
#>  7    97     3     2     1
#>  8   351     7     2     2
#>  9   628     2     2     3
#> 10   661     1     2     4
#> 11   818    10     2     5
#> 12   905     1     2     6

Created on 2020-01-28 by the reprex package (v0.3.0)

1 Like

That's perfect, thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.