Dear tidyverse
I am trying to scramble real world data to protect data owners. I want to achieve within + or - one standard deviation for all numeric fields.
E.g. seps column is now 81 but after scramble, it would be within one sd of 81.
Any better way to scrambling is also welcomed.
I hope you will help me achieve this.
Thank you very much.
Sample txt file attached.
Will this do the trick?
# Load libraries ----------------------------------------------------------
library("tidyverse")
# Define example data -----------------------------------------------------
set.seed(48448)
my_data <- tibble(
id = sample(LETTERS, 20),
v1 = rnorm(20),
v2 = rnorm(20),
v3 = rnorm(20)
)
# Wrangle data ------------------------------------------------------------
# Set width of scrambling
d <- 1
# Create long version of scrambled data
my_data_scrambled_long <- my_data %>%
pivot_longer(cols = -id,
names_to = "var",
values_to = "value") %>%
group_by(var) %>%
mutate(value_scrambled = value + runif(n = n(),
min = -d*sd(value),
max = d*sd(value))) %>%
ungroup
# Create wide version of scrambled data
my_data_scrambled_wide <- my_data_scrambled_long %>%
select(-value) %>%
pivot_wider(id_cols = id, names_from = var, values_from = value_scrambled)
# View scrambling ---------------------------------------------------------
my_data
my_data_scrambled_wide
Yielding:
> my_data
# A tibble: 20 × 4
id v1 v2 v3
<chr> <dbl> <dbl> <dbl>
1 G 0.762 0.990 -1.28
2 M -1.17 -0.386 0.908
3 S -0.863 0.0378 0.648
4 B -0.413 -0.424 -0.573
5 V 1.09 0.155 -0.371
6 Y -1.27 -0.143 -0.0217
7 W 0.621 0.244 -0.408
8 N 0.427 -2.65 -0.532
9 C 0.413 -0.771 0.0209
10 T -0.730 0.642 0.288
11 Q 0.170 0.908 -0.625
12 J 1.38 0.948 0.288
13 Z -0.688 -0.332 0.141
14 A -0.556 -1.18 0.851
15 P 0.155 -1.28 0.897
16 F -0.245 0.438 0.403
17 R -0.632 -0.649 0.966
18 O -1.19 -0.682 -1.66
19 E -0.0599 0.780 0.935
20 L -1.51 0.834 0.871
> my_data_scrambled_wide
# A tibble: 20 × 4
id v1 v2 v3
<chr> <dbl> <dbl> <dbl>
1 G 1.27 1.52 -1.34
2 M -0.751 0.377 0.262
3 S -1.46 -0.225 0.131
4 B -1.20 -0.459 -0.584
5 V 0.710 -0.407 0.0121
6 Y -1.11 -0.744 0.542
7 W 1.14 -0.677 0.229
8 N 1.12 -2.23 -0.876
9 C 0.420 -0.209 0.0419
10 T 0.0289 0.132 0.0977
11 Q 0.470 1.77 -0.828
12 J 1.55 0.644 0.287
13 Z -1.26 -0.463 0.621
14 A -0.107 -0.326 0.860
15 P 0.365 -1.57 0.611
16 F 0.148 0.811 -0.153
17 R -1.40 -1.39 0.912
18 O -1.95 -1.28 -1.04
19 E 0.148 1.27 1.32
20 L -2.29 1.72 1.29
Excellent. Thank you so much. That solves my problem.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.