Database with fixed values

Dear colleagues,

I saw this database of a question answered today:

df1 <- data.frame( Id = rep(1:5, length=900),
                   date1 = as.Date( "2021-12-01"),
                   date2= rep(seq( as.Date("2021-01-01"), length.out=450, by=1), each = 2),
                   Category = rep(c("ABC", "EFG"), length.out = 900),
                   Week = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
                                "Saturday", "Sunday"), length.out = 900),
                   DR1 = sample( 200:250, 900, repl=TRUE),
                   setNames( replicate(365, { sample(0:900, 900)}, simplify=FALSE),
                             paste0("DRM0", formatC(1:365, width = 2, format = "d", flag = "0"))))

And I have a doubt. From what I understand these data do not have their own values, they keep changing with each execution. How can I make an equal database, but with fixed values?

Wouldn't I have to use set.seed()?

Yes, set.seed() should work for you.

set.seed(123)
df1 <- data.frame( Id = rep(1:5, length=900),
                   date1 = as.Date( "2021-12-01"),
                   date2= rep(seq( as.Date("2021-01-01"), length.out=450, by=1), each = 2),
                   Category = rep(c("ABC", "EFG"), length.out = 900),
                   Week = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
                                "Saturday", "Sunday"), length.out = 900),
                   DR1 = sample( 200:250, 900, repl=TRUE),
                   setNames( replicate(365, { sample(0:900, 900)}, simplify=FALSE),
                             paste0("DRM0", formatC(1:365, width = 2, format = "d", flag = "0"))))
set.seed(123)
df2 <- data.frame( Id = rep(1:5, length=900),
                   date1 = as.Date( "2021-12-01"),
                   date2= rep(seq( as.Date("2021-01-01"), length.out=450, by=1), each = 2),
                   Category = rep(c("ABC", "EFG"), length.out = 900),
                   Week = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
                                "Saturday", "Sunday"), length.out = 900),
                   DR1 = sample( 200:250, 900, repl=TRUE),
                   setNames( replicate(365, { sample(0:900, 900)}, simplify=FALSE),
                             paste0("DRM0", formatC(1:365, width = 2, format = "d", flag = "0"))))

identical(df1, df2)
#> [1] TRUE

Created on 2022-04-22 by the reprex package (v0.2.1)

1 Like

Thanks for reply. Could you explain better what set.seed(123) is? Why 123? And Why does the database have a fixed value after using set.seed?

Any random process in R, such as used by the function sample(), is actually pseudo-random. Its initial value is determined by a "seed" and all subsequent values are determined. The values produced are distributed as if the process were random but the process is deterministic. The set.seed() function merely sets the seed value for the next random process. set.seed() accepts an integer and providing the same integer ensures that the seed value is the same. By setting the seed value, the "random" processes always produce the same values. The integer value passed to set.seed() has no other meaning. I use 123 out of habit but there is nothing special about it.

Thanks for excellent explanation!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.