Convert “Sample()” results into a data frame

I am creating random samples from 2 different data frames and would like to create a new data frame from the random sample. Is this possible? I have not been able to find an example that works for me.

Here's the code:

#Enter the data into R
#Read the data sets and check that the data imported correctly
Round1 <- read.csv('/cloud/project/Dice Results Round 1.csv')
View(Round1)

Round2 <- read.csv('/cloud/project/Dice Results Round 2.csv')
View(Round2)


#Perform some exploratory data analysis procedures
summary(Round1)
#>  Dice.Colour   Hand.Tiles     Hand.Carpet       Cup.Tiles    
#>  Black :1    Min.   :1.290   Min.   :0.5200   Min.   :2.630  
#>  Blue  :1    1st Qu.:1.677   1st Qu.:0.6150   1st Qu.:3.850  
#>  Green :1    Median :2.115   Median :0.8000   Median :4.635  
#>  Purple:1    Mean   :2.112   Mean   :0.7567   Mean   :5.162  
#>  Red   :1    3rd Qu.:2.507   3rd Qu.:0.8950   3rd Qu.:6.755  
#>  Yellow:1    Max.   :2.980   Max.   :0.9400   Max.   :8.020  
#>    Cup.Carpet   
#>  Min.   :1.830  
#>  1st Qu.:1.950  
#>  Median :2.075  
#>  Mean   :2.113  
#>  3rd Qu.:2.237  
#>  Max.   :2.500
summary(Round2)
#>  Dice.Colour   Hand.Tiles     Hand.Carpet       Cup.Tiles    
#>  Black :1    Min.   :1.670   Min.   :0.3000   Min.   :3.110  
#>  Blue  :1    1st Qu.:1.718   1st Qu.:0.5425   1st Qu.:3.965  
#>  Green :1    Median :2.115   Median :0.6150   Median :6.260  
#>  Purple:1    Mean   :2.278   Mean   :0.6533   Mean   :5.898  
#>  Red   :1    3rd Qu.:2.640   3rd Qu.:0.8525   3rd Qu.:7.513  
#>  Yellow:1    Max.   :3.370   Max.   :0.9400   Max.   :8.630  
#>    Cup.Carpet   
#>  Min.   :1.770  
#>  1st Qu.:1.990  
#>  Median :2.125  
#>  Mean   :2.212  
#>  3rd Qu.:2.312  
#>  Max.   :2.930

#Use “Sample()” to randomly select 2 results from round 1 for all 6 dice
#Create a data frame to hold results
sample(Round1, size=3, replace =F)
#>   Hand.Carpet Cup.Carpet Dice.Colour
#> 1        0.78       1.83       Black
#> 2        0.56       2.01        Blue
#> 3        0.82       2.50       Green
#> 4        0.52       1.93      Purple
#> 5        0.94       2.27         Red
#> 6        0.92       2.14      Yellow


#Use “Sample()” to randomly select 2 results from round 2 for all 6 dice
#Create a data frame to hold results 
sample(Round2, size=3, replace =F)
#>   Dice.Colour Cup.Carpet Cup.Tiles
#> 1       Black       2.23      3.11
#> 2        Blue       2.02      7.19
#> 3       Green       1.98      3.51
#> 4      Purple       2.34      5.33
#> 5         Red       1.77      7.62
#> 6      Yellow       2.93      8.63

This is not a reproducible example, as we don't have access to your data. Please You share those either with dput, or datapasta, or Github Gist, or something else. However, if you fighure out the problem by yourself or from my attempted explanation below, you don't need to share it, but keep this in mind for future posts.

What I understand from the comments in your code is that you want to select 2 rows randomly from both the datasets. Is that correct? I'm not sure as you used 3 and then did something that doesn't seem right.

If so, you can't use sample this way. If DF is a data.frame with C columns and R rows and you use sample(DF, k), it doesn't choose k rows among total R rows randomly, rather it chooses k out of C columns randomly.

The correct approach will be to create a sequence from 1 to the number of rows of the dataset you have, and choose 2 numbers from that sequence. Then, select the corresponding rows of the dataset. I'm not sharing the code, as it'll be fairly straightforward.

If you prefer to use dplyr, there is a function sample_n, which you can explore also.

Hope this helps.

1 Like

Thanks, I am using size 3 as I need to include the Dice Colour + 2 random results from both data sets. I will give dplyr sample_n a try.

Hi @ppines,

Here is a tidy way of doing this:

library(dplyr)
library(readr)
library(purrr)

# Vector of data sets to be read
dfs <- c('/cloud/project/Dice Results Round 1.csv', 
         '/cloud/project/Dice Results Round 2.csv')

# Read in each data and stack them into one data frame
# The variable df uniquely identifies each data set
data <- map_dfr(dfs, read_csv, .id = 'df')

# Group by the data set and sample 3 rows each
data %>%
  group_by(df) %>%
  sample_n(3) %>%
  ungroup()

Hope this is helpful.

2 Likes

Perfect! Thanks, it worked!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.