I have a large database to face with this structure
The str of my real databse is like this
tibble [561 x 128] (S3: tbl_df/tbl/data.frame)
paciente : num [1:561] 6430 6494 6165 6278 6188 ...
..- attr(*, "format.spss")= chr "F5.0"
sexo_s1 : Factor w/ 2 levels "Hombre","Mujer": 1 2 2 2 1 1 1 1 1 2 ...
..- attr(, "label")= chr "Sexo"
edad_s1 : num [1:561] 63 63 73 66 58 67 69 55 62 63 ...
..- attr(*, "label")= chr "Edad"
..- attr(*, "format.spss")= chr "F3.0"
peso1_v00 : num [1:561] 115.4 76.2 93.8 71.3 107 ...
..- attr(, "label")= chr "Peso: 1a determinación"
..- attr(, "format.spss")= chr "F5.1"
cintura1_v00 : num [1:561] 123 106 117 105 116 ...
..- attr(*, "label")= chr "Cintura: 1a determinación"
..- attr(*, "format.spss")= chr "F5.1"
tasis2_e_v00 : num [1:561] 139 129 136 138 146 151 145 140 134 115 ...
..- attr(, "label")= chr "TA: tensión arterial 2: sistólica"
..- attr(, "format.spss")= chr "F4.0"
$ tadias2_e_v00 : num [1:561] 78 63 71 76 80 71 75 82 59 61 ...
..- attr(, "label")= chr "TA: tensión arterial 2: diastólica"
..- attr(*, "format.spss")= chr "F4.0"
Let's summarise in this dataframe what I need to do
paciente <- c(6430, 6494, 6165, 6278, 6188, 6447, 6207, 6463)
sexo_s1 <- c("Hombre", "Mujer", "Mujer", "Mujer", "Hombre", "Hombre", "Mujer")
edad_s1 <- c(54, 68, 75, 85, 78, 80, 78, 90)
peso1_v00 <- c(115.2, 85, 98, 87, 85, 78, 84, 98)
cintura1_v00 <- c(115, 125, 110, 114, 120, 121 125, 110)
coltot_v00 <- c(215, 220, 210, 225, 215, 220, 230, 220)
peso1_v66 <- c(110.2, 80, 95, 87, 83, 78, 84, 98)
cintura1_v01 <- c(112, 125, 110, 110, 112, 121 120, 110)
coltot_v01 <- c(210, 210, 205, 215, 215, 210, 230,1 220)
peso1_v01 <- c(110.2, 80, 95, 87, 83, 78, 84, 98)
cintura1_v01 <- c(112, 125, 110, 110, 112, 121 120, 110)
coltot_v01 <- c(210, 210, 205, 215, 215, 210, 230,1 220)
I need to perform several statistical analysis:
- Run normality test (shapiro.test and boxplot) across numeric variables (125 out of 128 variables). I am trying to do it whith purrr::map and similars (purrr:map_dfr)
iterative_example<-map_dfr(.x = quos(paciente, sexo_s1, edad_s1, peso1_v00, cintura1_v00, coltot_v00, peso1_v66, cintura1_v66, coltot_v66, peso1_v01, cintura1_v01, coltot_v01), .f = ~ shapiro.test, data = df_example)
error/rlang_error>
Argument 1 must be a data frame or a named atomic vector.
Backtrace:
- purrr::map_dfr(...)
- dplyr::bind_rows(res, .id = .id)t.
If I exchange map_dfr with map I obtain a list which I cannot export or transform into a data.frame
iterative_example<-map(.x = quos(paciente, sexo_s1, edad_s1, peso1_v00, cintura1_v00, coltot_v00, peso1_v66, cintura1_v66, coltot_v66, peso1_v01, cintura1_v01, coltot_v01), .f = ~ shapiro.test, data = df_example)
List of 9
:function (x)
:function (x)
:function (x)
:function (x)
:function (x)
:function (x)
:function (x)
:function (x)
$ :function (x)
I cannot export or unnest the list to get the p-value and t results for the time being, but I'll sort it out. However I'd like to get a data.frame.
Similar to this operation I have to run iterative t.test between the variables observed at different times, if there are significative difference between the taken measurements (I've tried the same map function but I get the exact nested list that with shapiro.test
For example
t.test(df_example$peso1_v00, df_example$peso1_v66)
t.test(df_example$cintura1_v00, df_example$cintura1_v66)
Syntax to recognise the name of variable:
"i_variable1_v00" at specific time "v00" and testing with "i_variable1_v66". I've tried: starts_with() but no result
I am not sure how to perform this and export the output
Welch Two Sample t-test
data: df_example$cintura1_v00 and df_example$cintura1_v66
t = -0.051503, df = 10.399, p-value = 0.9599
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5.504848 5.254848
sample estimates:
mean of x mean of y
117.500 117.625
2 - Create new columns from the values at 0, 6 and 12 months iteratively. I have created the variables but sistematically repeating lines with the variables in the database. An example with a differente variable in my database.
I am looking for sthg to create variables in new columns iteratively between the variables taken at different time moments:
d_peso1_v66: differnce 0- +6 months
d_peso1_v01: difference 0 - 12 months
Example with 2 variables without iteration:
df_example<-mutate(df_example, d_peso1_v66 = peso1_v66 - peso1_v00)
df_example<-mutate(df_example, d_coltot_v01 = coltot_v01 - coltot_v00)
d_variable1_v66 = i_variable1_v66 - i_variable1_v00
d_variable1_v01 = i_variable1_v01 - i_variable1_v00
d_variable2_v01 = i_variable2_v66 - i_variable2_v00
d_variable2_v01 = i_variable2_v01 - i_variable2_v00
df_example <-mutate(across(where(is.numeric)),
varname <- paste("varname01", if variable contains "01" )
df_example <- mutate(df, varname = Petal.Width * n)
Not sure if it possible to perform it in one step, or it is necessary create a function and the pass through the database with map function. Sthg like this but making difference (difference_function)
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
And then wiht map function
df_example2 <- map_dfr (.x = df_example, .f = ~ difference_function, data = df_example)
I have been struggling with different approaches that take much more time than what I think it should take if I knew how to write the syntaxis