Hello, I'm very new to R and I'm having some trouble with the following:
I currently have a plot of disease against year, and after the year 1980 a the trend changes. I'm trying to split the data into two data sets, one being before the change (1950-1980) and after the change (1981-2000) so that I can plot the data before this year and run a linear model from it.
How would I write the code for this? I'm struggling mainly because I don't know how to split my data between these 2 dates.
Depending on how you you want to train your model on the data there are a couple routes you can go. First though, let's make your test dataframe:
library(tidyr)
# using tribble to clearly make a manual copy of your data
disease_df <- tidyr::tribble(
~Disease, ~year, ~cases,
"A", 1950, 29,
"A", 1970, 32,
"A", 1981, 222,
"A", 1990, 2993,
"A", 2000, 3929
)
disease_df
# R returns a vector of TRUE/FALSES based your logical condition
disease_df$year < 1980
# split the data into a list of two dataframes based on your logical
split_df <- split(disease_df, disease_df$year < 1980)
split_df
# if you want to train the model on the entire dataset with a dummy variable for pre/post 1980
disease_df$before_1980 <- disease_df$year < 1980
disease_df
In the future I would recommend making a RePrex (RePoducacble Example) . It makes it easier to answer the specific question you are asking.