To select randomly from a large dataset

Soldado · June 3, 2020, 9:10am

Actually, I need to choose randomly 50 participants from a dataset with 130 participants. The only way I know now it's create vectors that's why my code consists only of 10 values and not of 130.

Could please help me to optimize the code? [I'm learning R only for a month]

data <- data.frame ("code" = c("4194", "asd45fg", "sadg65", "adfg65", "ad4fg65", "agg87",
"fhfrh32","hjhj3", "8989", "dhjik1" ),
"comp" = c(20, 30, 60, 65, 80, 100, 40, 37, 89, 10))
data
set.seed(1:10)
data_s1 <- sample_n(data, 5)
data_s1

siddharthprabhu · June 3, 2020, 9:28am

I don't really understand what you mean by this but selecting a sample number of rows from a data set only requires use of sample_n(). In the example below, I'm randomly selecting 3 rows from your data set.

Also, set.seed() takes a single value for the seed parameter, not a vector.

library(dplyr, warn.conflicts = FALSE)

data <- data.frame ("code" = c("4194", "asd45fg", "sadg65", "adfg65", "ad4fg65", "agg87", "fhfrh32","hjhj3", "8989", "dhjik1"),
                    "comp" = c(20, 30, 60, 65, 80, 100, 40, 37, 89, 10))

set.seed(42)

data_s1 <- sample_n(data, size = 3)

print(data_s1)
#>      code comp
#> 1    4194   20
#> 2 ad4fg65   80
#> 3  dhjik1   10

^{Created on 2020-06-03 by the reprex package (v0.3.0)}

Soldado · June 3, 2020, 9:37am

Thank you!
What I meant was that in fact my columns "code" en "comp" consist of 130 values. Do I need than write all of these values in vectors?

siddharthprabhu · June 3, 2020, 9:47am

Where does your data set reside? If it's in a CSV or an Excel sheet, you could read it into R using appropriate functions. That would save you the trouble of creating the data frame by hand.

system · June 10, 2020, 9:47am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.