My sqldf() query is only returning 7 rows where my R code returns 16. What am I doing wrong?

Here's what I have so far:

library("nycflights13")
library("sqldf")
library(tidyverse)
dffl= flights
dffl = dffl%>% drop_na()
dfa=airlines
dfall <- merge(dfa, dffl, by ="carrier")

sqldf("
      select name, AVG(flights.arr_delay) as 'average delay' from flights 
      inner join airlines on flights.carrier = airlines.carrier
      where flights.dest = 'ATL'
      group by name
      order by name, 'average delay' desc
      ")

I get 7 rows
but with this I get 16:

dfall %>%
  group_by(name, dest= 'ATL') %>%
  summarise(mean_delay= mean(arr_delay))%>%
  arrange(mean_delay)

In the R code producing 16 rows, the code is grouping by name and setting all dest = 'ATL'. I believe you want to first filter to dest == 'ATL' and then group by name only.

dfall %>%
  filter(dest == 'ATL') %>%
  group_by(name) %>%
  summarise(mean_delay= mean(arr_delay))%>%
  arrange(mean_delay)
#> # A tibble: 7 × 2
#>   name                        mean_delay
#>   <chr>                            <dbl>
#> 1 Endeavor Air Inc.                0.857
#> 2 Southwest Airlines Co.           6.90 
#> 3 Delta Air Lines Inc.             7.42 
#> 4 United Air Lines Inc.           10.5  
#> 5 Envoy Air                       14.0  
#> 6 ExpressJet Airlines Inc.        19.6  
#> 7 AirTran Airways Corporation     20.7

Created on 2022-10-13 with reprex v2.0.2.9000

1 Like

Oh, that makes sense. Thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.