Recupe only the last result in a new column

laplanca · April 19, 2022, 11:43am

I have data with few column in fonction of the number of tests done by a patient. All the patients haven't done the same number of test. I want to recup in a new column only the last test done by a patient.

My data looks like something like that :
id / name / resultatD1/resul_d1/diff_d1/ resultatD2/resul_d2/diff_d2/resultatD3/resul_d3/diff_d3
06/james/P/0/12/ / / / / / /
14/franck/N/0/22/I/0/4/I/3/7/
24/smith/N/0/15/P/0/40/ / / /

and i want something like that (lt for last test):
id / name / resultatD1/resul_d1/diff_d1/ resultatD2/resul_d2/diff_d2/resultatD3/resul_d3/diff_d3/ resultat_lt/resul_lt/diff_lt
06/james/P/0/12/ / / / / / /P/0/12
14/franck/N/0/22/I/0/4/I/3/7/I/3/7
24/smith/N/0/15/P/0/40/ / / /P/0/40

I really hope you can help me, thank you for your time!

pieterjanvc · April 19, 2022, 1:15pm

Hi there,

Here is way of doing this with some data frame manipulations:

#The data
myData = data.frame(
  stringsAsFactors = FALSE,
        id = c("06", "14", "24"),
        name = c("james", "franck","smith"),
        resultatD1 = c("P", "N", "N"),
          resul_d1 = c(0L, 0L, 0L),
           diff_d1 = c(12L, 22L, 15L),
        resultatD2 = c(NA, "I", "P"),
          resul_d2 = c(NA, 0L, 0L),
           diff_d2 = c(NA, 4L, 40L),
        resultatD3 = c(NA, "I", NA),
          resul_d3 = c(NA, 3L, NA),
           diff_d3 = c(NA, 7L, NA)
)

#Extract the last 3 values from each row
newCols = apply(myData, 1, function(x){
  
  #Get the last non-NA value in the row
  last = which(!is.na(x)) 
  #If the last value is not NA, use whole row
  last = ifelse(length(last) == 0, length(x), last[length(last)])
  #Cut vector up to last non-NA
  x = x[1:last]
  #get last 3 values
  x[(length(x)-2):length(x)]
  
})

#Create a new data frame from them
newCols = as.data.frame(t(newCols))

#Set the new column names and clas
colnames(newCols) = c("resultat", "resul", "diff")
newCols$resul = as.integer(newCols$resul )
newCols$diff = as.integer(newCols$diff)
newCols
#>   resultat resul diff
#> 1        P     0   12
#> 2        I     3    7
#> 3        P     0   40

#Bind to the original data
myData = cbind(myData, newCols)
myData
#>   id   name resultatD1 resul_d1 diff_d1 resultatD2 resul_d2 diff_d2 resultatD3
#> 1 06  james          P        0      12       <NA>       NA      NA       <NA>
#> 2 14 franck          N        0      22          I        0       4          I
#> 3 24  smith          N        0      15          P        0      40       <NA>
#>   resul_d3 diff_d3 resultat resul diff
#> 1       NA      NA        P     0   12
#> 2        3       7        I     3    7
#> 3       NA      NA        P     0   40

^{Created on 2022-04-19 by the reprex package (v2.0.1)}

Hope this helps,
PJ

laplanca · April 19, 2022, 1:35pm

I tried your code, it seems great but I don't know why I have not the good results in the end :

pieterjanvc · April 19, 2022, 2:29pm

Hi,

Did you copy-paste the code I provided in the example? If I run this, I get the expected output.

laplanca · April 20, 2022, 7:05am

pieterjanvc:

#The data
myData = data.frame(
  stringsAsFactors = FALSE,
        id = c("06", "14", "24"),
        name = c("james", "franck","smith"),
        resultatD1 = c("P", "N", "N"),
          resul_d1 = c(0L, 0L, 0L),
           diff_d1 = c(12L, 22L, 15L),
        resultatD2 = c(NA, "I", "P"),
          resul_d2 = c(NA, 0L, 0L),
           diff_d2 = c(NA, 4L, 40L),
        resultatD3 = c(NA, "I", NA),
          resul_d3 = c(NA, 3L, NA),
           diff_d3 = c(NA, 7L, NA)
)

#Extract the last 3 values from each row
newCols = apply(myData, 1, function(x){
  
  #Get the last non-NA value in the row
  last = which(!is.na(x)) 
  #If the last value is not NA, use whole row
  last = ifelse(length(last) == 0, length(x), last[length(last)])
  #Cut vector up to last non-NA
  x = x[1:last]
  #get last 3 values
  x[(length(x)-2):length(x)]
  
})

#Create a new data frame from them
newCols = as.data.frame(t(newCols))

#Set the new column names and clas
colnames(newCols) = c("resultat", "resul", "diff")
newCols$resul = as.integer(newCols$resul )
newCols$diff = as.integer(newCols$diff)
newCols
#>   resultat resul diff
#> 1        P     0   12
#> 2        I     3    7
#> 3        P     0   40

#Bind to the original data
myData = cbind(myData, newCols)
myData

yes I paste your code. That's why it's really weird

laplanca · April 20, 2022, 9:10am

Finally, I resolved this with a loop :

for (i in 1:maxTest){
  Var1 <- paste0("date_testD", i)
  Var2 <- paste0("resultatD",i)
  for (j in 1:nrow(commun)){
    if(isTRUE(!is.na(commun[[Var1]][j])))
      commun$resultatfinal[j]=commun[[Var2]][j]
  }
}

system · April 27, 2022, 9:10am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.