Asking for help, R for dummies ;-)

Hello everybody ! It is my first post on Rstudio community forum, I am in PhD in France in Plant biology and I am helping somebody who need to handle a lot of data. Well, my problem is : Is it possible to create a vector containing a sequence of number (for example 10 to 420 by 10) 10, 20, 30, 40, etc, to 420 ; and to repeat this sequence in order to follow another vector sequence of character?. Writing this question I am seeing that it is really not clear :wink:

So this is my data.frame (an extract only)

elements value conditions distance
1 Mg24 4.340197e+00 pl1_CT01 10
2 Mg24 5.372596e+00 pl1_CT01 20
3 Mg24 9.665040e+00 pl1_CT01 30
4 Mg24 8.851774e+00 pl1_CT01 40
5 Mg24 1.564440e+01 pl1_CT01 50
6 Mg24 1.004873e+01 pl1_CT01 60
7 Mg24 1.254677e+01 pl1_CT01 70
8 Mg24 8.017576e+00 pl1_CT01 80
9 Mg24 5.748200e+00 pl1_CT01 90
10 Mg24 1.573673e+00 pl1_CT01 100
11 Mg24 3.798806e-01 pl1_CT01 110
12 Mg24 1.101723e-01 pl1_CT01 120
13 Mg24 4.499268e-02 pl1_CT01 130
14 Mg24 2.784069e-02 pl1_CT01 140
15 Mg24 2.554249e-02 pl1_CT01 150
16 Mg24 3.770222e-02 pl1_CT01 160
17 Mg24 3.508327e-02 pl1_CT02 10
18 Mg24 4.937052e-01 pl1_CT02 20
19 Mg24 1.276152e+00 pl1_CT02 30
20 Mg24 1.159849e+00 pl1_CT02 40
21 Mg24 2.762671e+00 pl1_CT02 50
22 Mg24 3.404331e+00 pl1_CT02 60
23 Mg24 3.641507e+00 pl1_CT02 70
24 Mg24 5.367379e+00 pl1_CT02 80
25 Mg24 5.399442e+00 pl1_CT02 90
26 Mg24 3.708117e+00 pl1_CT02 100
27 Mg24 2.098231e+00 pl1_CT02 110

So I have 3 columns : elements, value, and conditions :slight_smile:I would like to add a new column called "distance_in_ยตm" containing the famous sequence of number from 10 to 420. BUT the trick is to count 10 by 10 until it changes of condition. When it changes of condition I would like to count again from 10. I put the ideal column next to the data.frame in italics. The big issue in this new column is that I need to reapeat the sequence but the number of value for each condition is different : I let you appreciate the priblem.

If anybody can save my life He will get my eternal gratitude !

Thank for all of you, best wishes !

Alexis

I would use group_by() and mutate() from the dplyr package. I used as.data.frame() in the last step only to force the display of the entire data set.

library(dplyr)
DF <- data.frame(Condition = c(rep("A", 7),rep("B", 9) ,rep("C", 5) ,rep("D", 6)))
DF
#>    Condition
#> 1          A
#> 2          A
#> 3          A
#> 4          A
#> 5          A
#> 6          A
#> 7          A
#> 8          B
#> 9          B
#> 10         B
#> 11         B
#> 12         B
#> 13         B
#> 14         B
#> 15         B
#> 16         B
#> 17         C
#> 18         C
#> 19         C
#> 20         C
#> 21         C
#> 22         D
#> 23         D
#> 24         D
#> 25         D
#> 26         D
#> 27         D
DF <- DF %>% group_by(Condition) %>% mutate(distance = seq(10, n() * 10, 10))
as.data.frame(DF)
#>    Condition distance
#> 1          A       10
#> 2          A       20
#> 3          A       30
#> 4          A       40
#> 5          A       50
#> 6          A       60
#> 7          A       70
#> 8          B       10
#> 9          B       20
#> 10         B       30
#> 11         B       40
#> 12         B       50
#> 13         B       60
#> 14         B       70
#> 15         B       80
#> 16         B       90
#> 17         C       10
#> 18         C       20
#> 19         C       30
#> 20         C       40
#> 21         C       50
#> 22         D       10
#> 23         D       20
#> 24         D       30
#> 25         D       40
#> 26         D       50
#> 27         D       60

Created on 2019-10-29 by the reprex package (v0.3.0.9000)

A similar, but slightly different solution:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(seed = 43457)

fake_data <- tibble(element = "Mg24",
                    values = runif(n = 50),
                    conditions = paste("CT",
                                       sort(x = sample.int(n = 3,
                                                           size = 50,
                                                           replace = TRUE)),
                                       sep = "_"))

fake_data %>%
    group_by(conditions) %>%
    mutate(distances = seq(from = 10,
                           by = 10,
                           length.out = n())) %>%
    ungroup() %>%
    as.data.frame()
#>    element      values conditions distances
#> 1     Mg24 0.545005643       CT_1        10
#> 2     Mg24 0.486639130       CT_1        20
#> 3     Mg24 0.097248280       CT_1        30
#> 4     Mg24 0.587748632       CT_1        40
#> 5     Mg24 0.686058647       CT_1        50
#> 6     Mg24 0.904403227       CT_1        60
#> 7     Mg24 0.390555050       CT_1        70
#> 8     Mg24 0.611235482       CT_1        80
#> 9     Mg24 0.228512946       CT_1        90
#> 10    Mg24 0.019515451       CT_1       100
#> 11    Mg24 0.233271330       CT_1       110
#> 12    Mg24 0.292703202       CT_1       120
#> 13    Mg24 0.774000830       CT_2        10
#> 14    Mg24 0.196801875       CT_2        20
#> 15    Mg24 0.701186024       CT_2        30
#> 16    Mg24 0.523900585       CT_2        40
#> 17    Mg24 0.685330339       CT_2        50
#> 18    Mg24 0.008923621       CT_2        60
#> 19    Mg24 0.514414684       CT_2        70
#> 20    Mg24 0.463353170       CT_2        80
#> 21    Mg24 0.778949688       CT_2        90
#> 22    Mg24 0.591954395       CT_2       100
#> 23    Mg24 0.120443930       CT_2       110
#> 24    Mg24 0.456395657       CT_2       120
#> 25    Mg24 0.486391511       CT_2       130
#> 26    Mg24 0.602857009       CT_2       140
#> 27    Mg24 0.039863593       CT_2       150
#> 28    Mg24 0.835323205       CT_3        10
#> 29    Mg24 0.034237358       CT_3        20
#> 30    Mg24 0.482824514       CT_3        30
#> 31    Mg24 0.245912920       CT_3        40
#> 32    Mg24 0.838486735       CT_3        50
#> 33    Mg24 0.906126826       CT_3        60
#> 34    Mg24 0.082546392       CT_3        70
#> 35    Mg24 0.100543018       CT_3        80
#> 36    Mg24 0.785519243       CT_3        90
#> 37    Mg24 0.731853745       CT_3       100
#> 38    Mg24 0.121736422       CT_3       110
#> 39    Mg24 0.066367348       CT_3       120
#> 40    Mg24 0.038762275       CT_3       130
#> 41    Mg24 0.384566731       CT_3       140
#> 42    Mg24 0.342233852       CT_3       150
#> 43    Mg24 0.052980442       CT_3       160
#> 44    Mg24 0.717982176       CT_3       170
#> 45    Mg24 0.872447475       CT_3       180
#> 46    Mg24 0.281007298       CT_3       190
#> 47    Mg24 0.199600941       CT_3       200
#> 48    Mg24 0.600144481       CT_3       210
#> 49    Mg24 0.857524932       CT_3       220
#> 50    Mg24 0.861189633       CT_3       230

Created on 2019-10-29 by the reprex package (v0.3.0)

This is EXACTLY that I needed ! Thank you both Yarnabrina and FJCC the both solutions worked on my script, you just save me from hours of research :wink: Thank you very much !

See you another time maybe, have a good day

Friendly yours
Alexis