This Code Runs Quickly within RStudio Notebook, Takes Forever for Knit


#1

Here is a section of an Rmarkdown notebook that I've been running. The actual bits of R code run fine within the notebook (which seems to use rsession.exe for a couple seconds) but when I try to Knit an HTML document the first arrange() function cases an rterm.exe process to run for a long time, I have been stopping it after 5 minutes or so.

library(tidyverse)
library(knitr)
library(formatR)
# invalidate cache when the tufte version changes
knitr::opts_chunk$set(tidy = FALSE, htmltools.dir.version = FALSE)
options(scipen=1, digits=4)

library(nycflights13)

arrange(flights,desc(is.na(dep_time)))
#> # A tibble: 336,776 x 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1  2013     1     1       NA           1630        NA       NA
#>  2  2013     1     1       NA           1935        NA       NA
#>  3  2013     1     1       NA           1500        NA       NA
#>  4  2013     1     1       NA            600        NA       NA
#>  5  2013     1     2       NA           1540        NA       NA
#>  6  2013     1     2       NA           1620        NA       NA
#>  7  2013     1     2       NA           1355        NA       NA
#>  8  2013     1     2       NA           1420        NA       NA
#>  9  2013     1     2       NA           1321        NA       NA
#> 10  2013     1     2       NA           1545        NA       NA
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>

arrange(flights,desc(arr_delay))
#> # A tibble: 336,776 x 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1  2013     1     9      641            900      1301     1242
#>  2  2013     6    15     1432           1935      1137     1607
#>  3  2013     1    10     1121           1635      1126     1239
#>  4  2013     9    20     1139           1845      1014     1457
#>  5  2013     7    22      845           1600      1005     1044
#>  6  2013     4    10     1100           1900       960     1342
#>  7  2013     3    17     2321            810       911      135
#>  8  2013     7    22     2257            759       898      121
#>  9  2013    12     5      756           1700       896     1058
#> 10  2013     5     3     1133           2055       878     1250
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>
arrange(flights,dep_time)
#> # A tibble: 336,776 x 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1  2013     1    13        1           2249        72      108
#>  2  2013     1    31        1           2100       181      124
#>  3  2013    11    13        1           2359         2      442
#>  4  2013    12    16        1           2359         2      447
#>  5  2013    12    20        1           2359         2      430
#>  6  2013    12    26        1           2359         2      437
#>  7  2013    12    30        1           2359         2      441
#>  8  2013     2    11        1           2100       181      111
#>  9  2013     2    24        1           2245        76      121
#> 10  2013     3     8        1           2355         6      431
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>

arrange(flights,air_time)
#> # A tibble: 336,776 x 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1  2013     1    16     1355           1315        40     1442
#>  2  2013     4    13      537            527        10      622
#>  3  2013    12     6      922            851        31     1021
#>  4  2013     2     3     2153           2129        24     2247
#>  5  2013     2     5     1303           1315       -12     1342
#>  6  2013     2    12     2123           2130        -7     2211
#>  7  2013     3     2     1450           1500       -10     1547
#>  8  2013     3     8     2026           1935        51     2131
#>  9  2013     3    18     1456           1329        87     1533
#> 10  2013     3    19     2226           2145        41     2305
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>

#longest
arrange(flights,desc(distance))
#> # A tibble: 336,776 x 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1  2013     1     1      857            900        -3     1516
#>  2  2013     1     2      909            900         9     1525
#>  3  2013     1     3      914            900        14     1504
#>  4  2013     1     4      900            900         0     1516
#>  5  2013     1     5      858            900        -2     1519
#>  6  2013     1     6     1019            900        79     1558
#>  7  2013     1     7     1042            900       102     1620
#>  8  2013     1     8      901            900         1     1504
#>  9  2013     1     9      641            900      1301     1242
#> 10  2013     1    10      859            900        -1     1449
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>

#shortest
arrange(flights,distance)
#> # A tibble: 336,776 x 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1  2013     7    27       NA            106        NA       NA
#>  2  2013     1     3     2127           2129        -2     2222
#>  3  2013     1     4     1240           1200        40     1333
#>  4  2013     1     4     1829           1615       134     1937
#>  5  2013     1     4     2128           2129        -1     2218
#>  6  2013     1     5     1155           1200        -5     1241
#>  7  2013     1     6     2125           2129        -4     2224
#>  8  2013     1     7     2124           2129        -5     2212
#>  9  2013     1     8     2127           2130        -3     2304
#> 10  2013     1     9     2126           2129        -3     2217
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>

Created on 2019-01-02 by the reprex package (v0.2.1)


#2

Oh my, this is embarrassing.

Somehow as soon as I looked at my reprex posted here, outside the massive notebook of which it is a small part, the answer was obvious.

Unless within a notebook in Rstudio, the Knit process is trying to put all those thousands of flight records into the Knit'ed HTML document. It would create a massive .html file even if I let it run long enough to finish.

I should be assigning the result of each arrange() instance into a variable so it won't be flooding the output with the entire contents of the dataset. Or maybe put a head() funciton around the function call.

arrange1 <- arrange(flights,desc(distance))

or

head(arrange(flights,desc(distance))

Sure wish I could delete this question altogether. What a stupid I am...


#3

No big deal! I'll mark your reply where you figured it out as the solution, that way it can help others in the future! :rocket:


#4

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.