Are you familiar with spreadsheet sorting interfaces like this one?
https://goo.gl/images/99k1Qw
arrange is doing the same thing (except that it can sort by more than 3 variables!) — it is sorting the entire data frame (= arranging the rows) by the variables you choose. It’s not sorting the columns independently of each other.
People often describe this as “sorting first by Col1, then by Col2”, or as “sorting by Col2 within Col1”, but another way to think about it is in terms of breaking ties. If Col1 has repeated values, then all the rows with a given value in Col1 are “tied” for first place. How do you decide which row to put first? You look at the values in Col2 — and so on.
It might be easier to see what’s going on if we reorder the columns so that the arranged-by variables come first, and look at more rows of data:
> arrange(f, month,dep_delay,day) %>%
select(month, dep_delay, day, everything()) %>%
head(20)
# A tibble: 20 x 19
month dep_delay day year dep_time sched_dep_time arr_time sched_arr_time
<int> <dbl> <int> <int> <int> <int> <int> <int>
1 1 -30 11 2013 1900 1930 2233 2243
2 1 -27 29 2013 1703 1730 1947 1957
3 1 -22 12 2013 1354 1416 1606 1650
4 1 -22 21 2013 2137 2159 2232 2316
5 1 -21 20 2013 704 725 1025 1035
6 1 -20 12 2013 2050 2110 2310 2355
7 1 -20 12 2013 2134 2154 4 50
8 1 -20 14 2013 2050 2110 2329 2355
9 1 -19 4 2013 2140 2159 2241 2316
10 1 -18 11 2013 1947 2005 2209 2230
11 1 -18 19 2013 1912 1930 2026 2050
12 1 -18 23 2013 1142 1200 1239 1304
13 1 -18 27 2013 617 635 852 934
14 1 -17 4 2013 1243 1300 1432 1450
15 1 -17 7 2013 2013 2030 2150 2206
16 1 -17 9 2013 1143 1200 1242 1304
17 1 -17 10 2013 810 827 955 1031
18 1 -17 14 2013 1558 1615 1826 1831
19 1 -17 15 2013 543 600 710 715
20 1 -17 25 2013 1143 1200 1242 1304
# ... with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
# tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
# hour <dbl>, minute <dbl>, time_hour <dttm>
Can you see how within repeated values of month, the rows are arranged in order of dep_delay, and then within repeated values of dep_delay the rows are arranged in order of day?