Calculating slopes of data conditional on parameters within the table (dates and figures)

Dear R Studio community,

I have a dataset that looks like this but is much larger

Drug Start  HN  Blood_date  Results
10/10/2015  1234    01/10/2010  90
10/10/2015  1234    01/10/2011  80
10/10/2015  1234    01/10/2012  70
10/10/2015  1234    01/10/2013  60
10/10/2015  1234    01/10/2014  50
10/10/2015  1234    01/10/2015  40
10/10/2015  1234    01/10/2016  30
10/10/2015  1234    01/10/2017  20
10/10/2015  1234    01/10/2018  19
10/10/2015  1234    01/10/2019  19
10/10/2015  1234    01/10/2020  18
10/10/2015  1234    01/10/2021  17
NA  4567    01/10/2010  90
NA  4567    01/10/2011  80
NA  4567    01/10/2012  70
NA  4567    01/10/2013  60
NA  4567    01/10/2014  50
NA  4567    01/10/2015  40
NA  4567    01/10/2016  30
NA  4567    01/10/2017  20
NA  4567    01/10/2018  19
NA  4567    01/10/2019  19
NA  4567    01/10/2020  18

I want to calculate the slope of the Results against time (Blood_date) but only for those dates that precede the "Drug Start" AND/OR prior to the Results being less than 20.In the first patient I would want to take the slope of GFRs prior to 10/10/2015 and in the second patient prior to 10/10/2017 when the Result becomes less than 20. The number of Results are not uniform per patient, not all patients have been started on the drug and not every patient's Results become <20. There are thousands of entries.

I can do simple line/slope calculations in R but the data munging part where it is conditional is challenging.

Your help would be greatly welcomed.

this would be a lot easier if there were patient id's in the data, or is that what 'hn' is ?

well, seems like you made alterations.
GFR ?

Error in 1:mw20 : result would be too long a vector
In addition: Warning message:
In min(which(d$Result <= 20)) :
Error in 1:mw20 : result would be too long a vector

Apologies!

add

print(nrow(d))

as a line before the min(which(
to see in the R console what sort of 'large' numbers you get

[[237]]

A tibble: 1 x 3

DrugStart Blood_date Results

1 NA 2018-09-20 90

There were 50 or more warnings (use warnings() to see the first 50)

warnings()
Warning messages:
1: In min(which(d$Results <= 20)) : no non-missing arguments to min; returning Inf
2: In min(which(d$Results <= 20)) : no non-missing arguments to min; returning Inf

There are lots more warnings that are exactly the same, the df table has 3752 rows..

it seems that for some patients, you dont have any non-NA results.
are NA results ever useful to you ?
change

df<-table

to

df <-filter(table,
            !is.na(Results))

If the patient is not on the drug then their DrugStart column will be a series of NAs for every row as each row represents a blood test. For those patients I would then want to just take their Results that are >20 with the view to calculating their slope of Results over time.

So those patients with NA are necessary to the outcome but I can change the term NA to something easier. If NA became today's date would that be helpful as they then would always be after "Blood_date"?

Again thanks for taking the time to help me!

It seems you were discussing NA's in drugstart, but I was asking about NA's in Result.

There are no NAs in Result

I ended up filtering the data even further and then applying:

table$date_of_bloods <- as.Date(table$Blood_date, format="%d-%m-%Y")
table$drug_start <- as.Date(table$drug_start, format="%d-%m-%Y")

subset(table, result > 20 & (date_of_bloods < drug_start | is.na(drug_start)))

I had formatted my dates incorrectly also.

Thanks for your help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

tf <-tempfile()

writeLines("DrugStart,HN,Blood_date,Results
10/10/2015,1234,01/10/2010,90
10/10/2015,1234,01/10/2011,80
10/10/2015,1234,01/10/2012,70
10/10/2015,1234,01/10/2013,60
10/10/2015,1234,01/10/2014,50
10/10/2015,1234,01/10/2015,40
10/10/2015,1234,01/10/2016,30
10/10/2015,1234,01/10/2017,20
10/10/2015,1234,01/10/2018,19
10/10/2015,1234,01/10/2019,19
10/10/2015,1234,01/10/2020,18
10/10/2015,1234,01/10/2021,17
NA,4567,01/10/2010,90
NA,4567,01/10/2011,80
NA,4567,01/10/2012,70
NA,4567,01/10/2013,60
NA,4567,01/10/2014,50
NA,4567,01/10/2015,40
NA,4567,01/10/2016,30
NA,4567,01/10/2017,20
NA,4567,01/10/2018,19
NA,4567,01/10/2019,19
NA,4567,01/10/2020,18", tf)
library(tidyverse)
library(lubridate)
df<- readr::read_csv(tf) %>%mutate_if(is.character,lubridate::dmy)

myprocess<-function(d,...){
  mds<-min(d$DrugStart,na.rm=FALSE)
  if(!is.na(mds)){
    return(filter(d,
                  Blood_date <= mds))
  } else {
    mw20 <- min(which(d$Results<=20))
    return(slice(d,
                 1:mw20))
  }
}
df %>% group_by(HN) %>% group_map(myprocess)

Dear @nirgrahamuk,

Many thanks for taking the time to offer a solution. My data is already imported into R as a data frame from Excel and I have coerced the data using the following code:

table <- readxl::read_excel("data.xlsx", sheet = "GFRs")
table <- as.data.frame(table)
table$Result <- as.numeric(table$Result)
table<- subset(table, Result >20)
table$Blood_date <- as.Date(table$Blood_date, format="%d/%m/%Y")
table$DrugStart <- as.Date(table$DrugStart, format="%d/%m/%Y")

How would I feed this table into your solution? If I enter your code I get:

df<- readr::read_table(table) %>%mutate_if(is.character,lubridate::dmy)
Error: file must be a string, raw vector or a connection.

And

Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "function"
when running the final part.

Apologies for what I am sure are really basic error on my part.

Dear @nirgrahamuk, apologies, that wasn't very clear. Yes HN is the patient ID.

Many thanks

i use df, as a name rather than table so do

table <- readxl::read_excel("data.xlsx", sheet = "GFRs")
table <- as.data.frame(table)
table$Result <- as.numeric(table$Result)
table$Blood_date <- as.Date(table$Blood_date, format="%d/%m/%Y")
table$DrugStart <- as.Date(table$DrugStart, format="%d/%m/%Y")

df<-table

and then my code from myprocess down

Thank you. On running it I get:

Error in 1:mw20 : result would be too long a vector
In addition: Warning message:
In min(which(d$GFR <= 20)) :
Error in 1:mw20 : result would be too long a vector

Apologies again!