Selecting data in a range

dplyr
rstudio

#1

I have a table of counties, states and their minimum and maximum temperatures. I need to select only the counties that have temperatures in a range such as -15 degrees to 40 degrees. What function would I use?


#2

Hi, can you put your question into reprex?

It would help everyone here help you in a most straightforward manner.

One possible approach is to group by county and then summarize it with minimum and maximum of their temperatures. Then you can use this information to filter out all the counties that are in the range and join it with your original table by name.


#3

@mishabalyasin
climate.minmax <-
climate.data %>%
group_by(County, State) %>%
summarise(temp_min = min(temp_min),
temp_max = max(temp_max))

I did summarize it and that is all I have. Now I need to filter out the ones that do not fit in my desired range. How do I do that?


#4

You can use temp_min and temp_max in your new dataset to create a new variable with mutate (something like mutate(include = temp_min >= -15 & temp_max <= 40))

Then you filter to only have rows with TRUE and use dplyr::semi_join on your original data.


#5

If you understand sql you can even try data.table package which is the fastest in entire R programming.

library(data.table)

climate.data %>% setDT()

climate.data[,.(temp_min=min(temp_min),
        temp_max=max(temp_max)),
by=.(County,State)][
(temp_min > -15) & (temp_max <40),]

FYI

data.table has a syntax like sql something like this

from[where, select, group by]