New in R, help with an exercise

Im trying to do an exercise where I have a data base from a softball team in X years. I have variables like Runs in a game made by the team and runs made by the opposite team, errors in a game, differences between point made by the team and points made by opposite teams, win, lose or tie, day, month and year of the match, etc.

I have to define What is the most number of errors which the team ever committed in a game which it won (and when did that game occur)? What code do you recomend me to use?

Hi, and welcome. This is a question that straddles the territory for when you need a reproducible example, called a reprex to be of help and those were you can provide some helpful general advice.

The answer to your question does not require all the variables, you need only TEAM, WIN, ERRORS and DATE. You have to construct the date from the DAY, MONTH and YEAR variables to get a date object. See the zoo and lubridate packages.

Put your database query to select the pieces, and cobble together DATE. If you are reading from a csv file, just bring it all into a data frame or tibble. We'll call this raw_data

Using the dplyr package

softball <- raw_data %>% select(TEAM, ERRORS, WIN, DAY, MONTH, YEAR) %>% filter(WIN == TRUE) %>% arrange(desc(ERRORS))

If you've coded WIN as 1/0 YES/NO or some other make the appropriate adjustments to the filter argument.

This is a classic divide and conquer problem. You don't care about runs or any of the other possible baseball statistics in your dataset, so put them to one side. You don't care about games that a team lost or tied, so extract only the wins. Now you want to find the highest number of errors in what's left and the corresponding date. This is essentially what analysis is all about -- taking complicated problems and dividing them into bite size pieces.

The hardest part of this will actually be constructing the DATE out of DAY MONTH YEAR. Hint: Use dplyr::mutate.

Good luck!

I cant use any package, I have to do it with the base program of R studio

I think you mean base R, not base RStudio, they are not the same thing, because if your are actually limited to tools produced by RStudio then you would be fine using dplyr package since it is developed by RStudio people.

Anyways, if you need specific help with this, then you would have to provide a minimal REPRoducible EXample (reprex) illustrating your issue. A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

1 Like

It can be done in R{base}, but not so easily.

You still want to get your data into a data frame. Now, instead of

select(TEAM, ERRORS, WIN, DAY, MONTH, YEAR)

you are going to have to immerse yourself into the mysteries of subset. See

?subset

It will also work for filter.

Instead of arrange

?sort

And to construct the date out of the pieces, you'll need

?paste
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.