Import specific sheets from excel workbook

johnmackintosh · May 18, 2020, 2:56pm

The readxl workflows page shows great examples of reading in all sheets from a workbook.
But what if I only want a selection of sheets?
In real life I have over 50. I already have a vector of sheet names I want to import.
I tried to adapt the example, as below, and failed miserably.
( I have tried a few things, this is one of the least worst)
I can hack my way round this, but was wondering how it could be done in more seamlessly in a piped manner, and more specifically, what to pass to the pipe / map part.
In real life, I will be extracting a specific range from each of the desired sheets, so the call to map will be more along the lines of

 map_df(~ read_excel(path = path, sheet = .x, range = "A5:F15"), .id = "sheet")

The erroneous code I tried:

library(readxl)
library(tidyverse)

sheets_to_keep <- c("iris","mtcars")

path <- readxl_example("datasets.xlsx")

all_sheets <- path %>%
  excel_sheets() 

path %>%
  excel_sheets() %>%
  set_names() %>%
  filter(.x %in% sheets_to_keep) %>% 
  map(read_excel)
#> Error in UseMethod("filter_"): no applicable method for 'filter_' applied to an object of class "character"

^{Created on 2020-05-18 by the reprex package (v0.3.0)}

mfherman · May 18, 2020, 4:35pm

Is this what you're looking for? You were definitely on the right track, but instead of extracting the sheet from the file, if you already know the names of the sheets you want to keep, you can simply iterate over that vector (sheets_to_keep) using map().

library(readxl)
library(purrr)

sheets_to_keep <- c("iris", "mtcars")
path <- readxl_example("datasets.xlsx")

map(sheets_to_keep, ~ read_excel(path, sheet = .x))
#> [[1]]
#> # A tibble: 150 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ... with 140 more rows
#> 
#> [[2]]
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows

^{Created on 2020-05-18 by the reprex package (v0.3.0)}

If you have different ranges for each sheet, you could use map2() and iterate over the sheet names and ranges with something like:

map2(sheets_to_keep, ranges, ~ read_excel(path, sheet = .x, range = .y))

ADDING:
If you want to use the .id parameter of map_df() to create a new column with your sheet names, you could do something like this:

library(readxl)
library(purrr)

sheets_to_keep <- set_names(c("iris", "mtcars"))
path <- readxl_example("datasets.xlsx")

map_dfr(sheets_to_keep, ~ read_excel(path, sheet = .x), .id = "sheet")
#> # A tibble: 182 x 17
#>    sheet Sepal.Length Sepal.Width Petal.Length Petal.Width Species   mpg   cyl
#>    <chr>        <dbl>       <dbl>        <dbl>       <dbl> <chr>   <dbl> <dbl>
#>  1 iris           5.1         3.5          1.4         0.2 setosa     NA    NA
#>  2 iris           4.9         3            1.4         0.2 setosa     NA    NA
#>  3 iris           4.7         3.2          1.3         0.2 setosa     NA    NA
#>  4 iris           4.6         3.1          1.5         0.2 setosa     NA    NA
#>  5 iris           5           3.6          1.4         0.2 setosa     NA    NA
#>  6 iris           5.4         3.9          1.7         0.4 setosa     NA    NA
#>  7 iris           4.6         3.4          1.4         0.3 setosa     NA    NA
#>  8 iris           5           3.4          1.5         0.2 setosa     NA    NA
#>  9 iris           4.4         2.9          1.4         0.2 setosa     NA    NA
#> 10 iris           4.9         3.1          1.5         0.1 setosa     NA    NA
#> # ... with 172 more rows, and 9 more variables: disp <dbl>, hp <dbl>,
#> #   drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>,
#> #   carb <dbl>

^{Created on 2020-05-18 by the reprex package (v0.3.0)}

johnmackintosh · May 18, 2020, 5:49pm

Brilliant thanks. Your addition was just what I was looking for.
I was too focused on trying to do everything in one pipe.
set_names was also something I'd overlooked in my earlier attempts.

all_sheets <- path %>%
  excel_sheets() %>%
  set_names()

model_sheets <- all_sheets[which(all_sheets %like% '* Specific Word*')]

model_df <- map_dfr(model_sheets,
                    ~ read_excel(path, sheet = .x, range = "A1:L50"),
                    .id = "sheet")

Thanks so much for your help

system · May 25, 2020, 5:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.