How do you split data sets in R?

Hello, I'm very new to R and I'm having some trouble with the following:

I currently have a plot of disease against year, and after the year 1980 a the trend changes. I'm trying to split the data into two data sets, one being before the change (1950-1980) and after the change (1981-2000) so that I can plot the data before this year and run a linear model from it.

How would I write the code for this? I'm struggling mainly because I don't know how to split my data between these 2 dates.

Thank you very much for your time in advance.

Disease year cases
A 1950 29
A 1970 32
A 1981 222
A 1990 2993
A 2000 3929

For example my data has a layout like the one above, and if I wanted to split it between 1950-1980 and 1981-200 how would I do this? thank you!

Depending on how you you want to train your model on the data there are a couple routes you can go. First though, let's make your test dataframe:

library(tidyr)

# using tribble to clearly make a manual copy of your data
disease_df <- tidyr::tribble(
    ~Disease, ~year, ~cases,
    "A", 1950, 29,
    "A", 1970, 32,
    "A", 1981, 222,
    "A", 1990, 2993,
    "A", 2000, 3929
)

disease_df

# R returns a vector of TRUE/FALSES based your logical condition
disease_df$year < 1980

# split the data into a list of two dataframes based on your logical
split_df <- split(disease_df, disease_df$year < 1980)

split_df

# if you want to train the model on the entire dataset with a dummy variable for pre/post 1980
disease_df$before_1980 <- disease_df$year < 1980

disease_df

In the future I would recommend making a RePrex (RePoducacble Example) . It makes it easier to answer the specific question you are asking.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.