New in R, help with an exercise

Hi, and welcome. This is a question that straddles the territory for when you need a reproducible example, called a reprex to be of help and those were you can provide some helpful general advice.

The answer to your question does not require all the variables, you need only TEAM, WIN, ERRORS and DATE. You have to construct the date from the DAY, MONTH and YEAR variables to get a date object. See the zoo and lubridate packages.

Put your database query to select the pieces, and cobble together DATE. If you are reading from a csv file, just bring it all into a data frame or tibble. We'll call this raw_data

Using the dplyr package

softball <- raw_data %>% select(TEAM, ERRORS, WIN, DAY, MONTH, YEAR) %>% filter(WIN == TRUE) %>% arrange(desc(ERRORS))

If you've coded WIN as 1/0 YES/NO or some other make the appropriate adjustments to the filter argument.

This is a classic divide and conquer problem. You don't care about runs or any of the other possible baseball statistics in your dataset, so put them to one side. You don't care about games that a team lost or tied, so extract only the wins. Now you want to find the highest number of errors in what's left and the corresponding date. This is essentially what analysis is all about -- taking complicated problems and dividing them into bite size pieces.

The hardest part of this will actually be constructing the DATE out of DAY MONTH YEAR. Hint: Use dplyr::mutate.

Good luck!