Iteration with variables with several time measurements . Purr and map

I have a large database to face with this structure
The str of my real databse is like this

tibble [561 x 128] (S3: tbl_df/tbl/data.frame)
paciente : num [1:561] 6430 6494 6165 6278 6188 ... ..- attr(*, "format.spss")= chr "F5.0" sexo_s1 : Factor w/ 2 levels "Hombre","Mujer": 1 2 2 2 1 1 1 1 1 2 ...
..- attr(, "label")= chr "Sexo"
edad_s1 : num [1:561] 63 63 73 66 58 67 69 55 62 63 ... ..- attr(*, "label")= chr "Edad" ..- attr(*, "format.spss")= chr "F3.0" peso1_v00 : num [1:561] 115.4 76.2 93.8 71.3 107 ...
..- attr(
, "label")= chr "Peso: 1a determinación"
..- attr(, "format.spss")= chr "F5.1"
cintura1_v00 : num [1:561] 123 106 117 105 116 ... ..- attr(*, "label")= chr "Cintura: 1a determinación" ..- attr(*, "format.spss")= chr "F5.1" tasis2_e_v00 : num [1:561] 139 129 136 138 146 151 145 140 134 115 ...
..- attr(
, "label")= chr "TA: tensión arterial 2: sistólica"
..- attr(, "format.spss")= chr "F4.0"
$ tadias2_e_v00 : num [1:561] 78 63 71 76 80 71 75 82 59 61 ...
..- attr(
, "label")= chr "TA: tensión arterial 2: diastólica"
..- attr(*, "format.spss")= chr "F4.0"

Let's summarise in this dataframe what I need to do

paciente <- c(6430, 6494, 6165, 6278, 6188, 6447, 6207, 6463)
sexo_s1 <-  c("Hombre", "Mujer", "Mujer", "Mujer", "Hombre", "Hombre", "Mujer")
edad_s1 <- c(54, 68, 75, 85, 78, 80, 78, 90)
peso1_v00 <- c(115.2, 85, 98, 87, 85, 78, 84, 98)
cintura1_v00 <- c(115, 125, 110, 114, 120, 121 125, 110)
coltot_v00 <- c(215, 220, 210, 225, 215, 220, 230, 220)
peso1_v66 <- c(110.2, 80, 95, 87, 83, 78, 84, 98)
cintura1_v01 <- c(112, 125, 110, 110, 112, 121 120, 110)
coltot_v01 <- c(210, 210, 205, 215, 215, 210, 230,1 220)
peso1_v01 <- c(110.2, 80, 95, 87, 83, 78, 84, 98)
cintura1_v01 <- c(112, 125, 110, 110, 112, 121 120, 110)
coltot_v01 <- c(210, 210, 205, 215, 215, 210, 230,1 220)

I need to perform several statistical analysis:

  1. Run normality test (shapiro.test and boxplot) across numeric variables (125 out of 128 variables). I am trying to do it whith purrr::map and similars (purrr:map_dfr)
iterative_example<-map_dfr(.x = quos(paciente, sexo_s1, edad_s1, peso1_v00, cintura1_v00, coltot_v00, peso1_v66, cintura1_v66, coltot_v66, peso1_v01, cintura1_v01, coltot_v01), .f = ~ shapiro.test, data = df_example)

error/rlang_error>
Argument 1 must be a data frame or a named atomic vector.
Backtrace:

  1. purrr::map_dfr(...)
  2. dplyr::bind_rows(res, .id = .id)t.

If I exchange map_dfr with map I obtain a list which I cannot export or transform into a data.frame

iterative_example<-map(.x = quos(paciente, sexo_s1, edad_s1, peso1_v00, cintura1_v00, coltot_v00, peso1_v66, cintura1_v66, coltot_v66, peso1_v01, cintura1_v01, coltot_v01), .f = ~ shapiro.test, data = df_example)

List of 9
:function (x) :function (x)
:function (x) :function (x)
:function (x) :function (x)
:function (x) :function (x)
$ :function (x)
I cannot export or unnest the list to get the p-value and t results for the time being, but I'll sort it out. However I'd like to get a data.frame.

Similar to this operation I have to run iterative t.test between the variables observed at different times, if there are significative difference between the taken measurements (I've tried the same map function but I get the exact nested list that with shapiro.test
For example

t.test(df_example$peso1_v00,  df_example$peso1_v66)
t.test(df_example$cintura1_v00,  df_example$cintura1_v66)

Syntax to recognise the name of variable:
"i_variable1_v00" at specific time "v00" and testing with "i_variable1_v66". I've tried: starts_with() but no result

I am not sure how to perform this and export the output

Welch Two Sample t-test

data: df_example$cintura1_v00 and df_example$cintura1_v66
t = -0.051503, df = 10.399, p-value = 0.9599
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5.504848 5.254848
sample estimates:
mean of x mean of y
117.500 117.625

2 - Create new columns from the values at 0, 6 and 12 months iteratively. I have created the variables but sistematically repeating lines with the variables in the database. An example with a differente variable in my database.

I am looking for sthg to create variables in new columns iteratively between the variables taken at different time moments:

d_peso1_v66: differnce 0- +6 months
d_peso1_v01: difference 0 - 12 months

Example with 2 variables without iteration:

df_example<-mutate(df_example, d_peso1_v66 = peso1_v66 - peso1_v00)
df_example<-mutate(df_example, d_coltot_v01 = coltot_v01 - coltot_v00)

d_variable1_v66 = i_variable1_v66 - i_variable1_v00
d_variable1_v01 = i_variable1_v01 - i_variable1_v00

d_variable2_v01 = i_variable2_v66 - i_variable2_v00
d_variable2_v01 = i_variable2_v01 - i_variable2_v00

df_example <-mutate(across(where(is.numeric)),
varname <- paste("varname01", if variable contains "01" )
df_example <- mutate(df, varname = Petal.Width * n)

Not sure if it possible to perform it in one step, or it is necessary create a function and the pass through the database with map function. Sthg like this but making difference (difference_function)

meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)

And then wiht map function

df_example2 <- map_dfr (.x = df_example, .f = ~ difference_function, data = df_example)

I have been struggling with different approaches that take much more time than what I think it should take if I knew how to write the syntaxis

From your text I get the impression that you handle data and code in pleasantly carefree way.
That is nice but my advice is to look sharp at the documentation of functions you use.

That being said, over to your problem where I address only the shapiro.test part.
I would attack it in the following way.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

# added some commas where they were missing:

paciente <- c(6430, 6494, 6165, 6278, 6188, 6447, 6207, 6463)
sexo_s1 <-  c("Hombre", "Mujer", "Mujer", "Mujer", "Hombre", "Hombre", "Mujer")
edad_s1 <- c(54, 68, 75, 85, 78, 80, 78, 90)
peso1_v00 <- c(115.2, 85, 98, 87, 85, 78, 84, 98)
cintura1_v00 <- c(115, 125, 110, 114, 120, 121, 125, 110)
coltot_v00 <- c(215, 220, 210, 225, 215, 220, 230, 220)
peso1_v66 <- c(110.2, 80, 95, 87, 83, 78, 84, 98)
cintura1_v01 <- c(112, 125, 110, 110, 112, 121, 120, 110)
coltot_v01 <- c(210, 210, 205, 215, 215, 210, 230,1, 220)
peso1_v01 <- c(110.2, 80, 95, 87, 83, 78, 84, 98)
cintura1_v01 <- c(112, 125, 110, 110, 112, 121, 120, 110)
coltot_v01 <- c(210, 210, 205, 215, 215, 210, 230,1, 220)

# lengths shortened to the shortest one because these differed from 7 to 9:

paciente <- paciente[1:7]
sexo_s1 <-  sexo_s1[1:7]
edad_s1 <- edad_s1[1:7]
peso1_v00 <- peso1_v00[1:7]
cintura1_v00 <- cintura1_v00[1:7]
coltot_v00 <- coltot_v00[1:7]
peso1_v66 <- peso1_v66[1:7]
cintura1_v01 <- cintura1_v01[1:7]
coltot_v01 <- coltot_v01[1:7]
peso1_v01 <- peso1_v01[1:7]
cintura1_v01 <- cintura1_v01[1:7]
coltot_v01 <- coltot_v01[1:7]

df_example <- data.frame(
  paciente, sexo_s1, edad_s1, peso1_v00, cintura1_v00, coltot_v00, peso1_v66,
   cintura1_v01, coltot_v01, peso1_v01, cintura1_v01, coltot_v01
  )

numeric_vars <- c("paciente", "edad_s1", "peso1_v00", 
                                 "cintura1_v00", "coltot_v00", "peso1_v66", "cintura1_v01",
                                 "coltot_v01", "peso1_v01", "cintura1_v01", "coltot_v01")
shapiro_results <- map(.x = numeric_vars, 
                       .f = ~shapiro.test(df_example[,.]))

names(shapiro_results) <- numeric_vars
str(head(shapiro_results,2)) # show shapiro.test results for first two numeric variables
#> List of 2
#>  $ paciente:List of 4
#>   ..$ statistic: Named num 0.867
#>   .. ..- attr(*, "names")= chr "W"
#>   ..$ p.value  : num 0.176
#>   ..$ method   : chr "Shapiro-Wilk normality test"
#>   ..$ data.name: chr "df_example[, .]"
#>   ..- attr(*, "class")= chr "htest"
#>  $ edad_s1 :List of 4
#>   ..$ statistic: Named num 0.873
#>   .. ..- attr(*, "names")= chr "W"
#>   ..$ p.value  : num 0.197
#>   ..$ method   : chr "Shapiro-Wilk normality test"
#>   ..$ data.name: chr "df_example[, .]"
#>   ..- attr(*, "class")= chr "htest"

data.frame(
  var = names(shapiro_results),
  p_value = purrr::map_dbl(shapiro_results,~pluck(.,"p.value")),
  W_stat  = purrr::map_dbl(shapiro_results,~pluck(.,"statistic"))
)
#>             var    p_value    W_stat
#> 1      paciente 0.17644582 0.8674893
#> 2       edad_s1 0.19674163 0.8728976
#> 3     peso1_v00 0.06255426 0.8188172
#> 4  cintura1_v00 0.51265149 0.9254195
#> 5    coltot_v00 0.87326921 0.9666420
#> 6     peso1_v66 0.11692041 0.8476370
#> 7  cintura1_v01 0.11263196 0.8458741
#> 8    coltot_v01 0.06607612 0.8212884
#> 9     peso1_v01 0.11692041 0.8476370
#> 10 cintura1_v01 0.11263196 0.8458741
#> 11   coltot_v01 0.06607612 0.8212884
Created on 2021-09-16 by the reprex package (v2.0.0)

Thanks for the help
I tried to reproduce an example of what I face, but obviously is smaller in complexity. On the other hand, is hard to copy 128 variables with "," separators between all the values

What you have done is great, but is just a tiny part of the whole. The hard part is to run test between the 3-time variables (v00, v66, v01). Is this possible with map function?

I think its implied that Vxx variables have a time component in their name , 'xx' being a particular month ?
In that case your data is not tidy, and to get full advantage of tidyverse and all the rest, its best to be tidy. Perhaps you could pivot longer your data and make it tidy.

Of course the Vxx have a time component. How do you suggest to tidy? Pivot_longer, and sort?
Because this is the other approach I have

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.