how to classify datasets in package documentation

I have several R packages containing many datasets I'd like to classify according to applicable methods and have this information searchable in the documentation and hopefully the associated pkgdown site.

Here is a sample, classified with tags in an Excel file

From the R-exts manual, the only thing I can see is concept{}, e.g.,

\keyword{datasets}
\concept{loglinear}
\concept{logit}
\concept{2x2}

Is there anything else I can use for this purpose?

Also, In RStudio, I can't find any way to search the documentation for one of these
keywords.

Concepts are good, I think. Or you can include that table in the documentation if you like.

?? searches for concepts as well, e.g. if you write

??loglinear

you'll get

...
stats::glm              Fitting Generalized Linear Models
  Concepts: log-linear, loglinear
...

You can also search only for concepts:

help.search("loglinear", fields = "concept")

Thx; that's helpful

In the documentation, or perhaps just the pkgdown site what would be useful is the "inverse table" of tags, with links to the associated datasets, something like:

tag                datasets
ca                  AirCrash, Burt, Bartlett, ...
glm                 Accident, Cormorants, ...
loglinear           Abortion, Accident, Alligator, Bartlett, ...
...

I need to figure out how to do that, and incorporate it into the pkgdown site.

That looks pretty good I think! Some bits that might help if you use roxygen2.

You can create a markdown table: (R)Markdown support • roxygen2

You can generate the table from metadata if you don't want to hardcode it:

For others who may be interested: dplyr and tidyr came to the rescue. Here's what I wanted:

> tag_dset
# A tibble: 20 × 2
   tag         datasets                                                                             
   <chr>       <chr>                                                                                
 1 2x2         Abortion; Bartlett; Heart                                                            
 2 agree       Mammograms                                                                           
 3 binomial    Geissler                                                                             
 4 ca          AirCrash; Burt; Draft1970table; Gilby; HospVisits; HouseTasks; Mental                
 5 glm         Accident; Cormorants; DaytonSurvey; Donner; Draft1970table; GSS; ICU; PhdPubs  
 ...

Here's how I got that:

library(readxl)
dsets_tagged <- read_excel("extra/vcdExtra-datasets.xlsx", 
                           sheet="vcdExtra-datasets")

dsets_tagged <- dsets_tagged |>
  dplyr::select(-Title, -dim) |>
  dplyr::rename(dataset = Item)

# to invert the table, need to split tags into separate observations
dset_split <- dsets_tagged |>
  tidyr::separate_longer_delim(tags, delim = ";") |>
  dplyr::mutate(tag = stringr::str_trim(tags)) |>
  select(-tags)

# collapse the rows for the same tag
tag_dset <- dset_split |>
  arrange(tag) |>
  dplyr::group_by(tag) |>
  dplyr::summarise(datasets = paste(dataset, collapse = "; ")) |> ungroup()

The final step is to turn the list of datasets under each tag into links in a vignette or other documentation for the pkgdown site. I can do this now as follows, by turning dset into [dset](help(dset)) in the table.

#' ## add links to the names of datasets
#' This function is designed to work with the `pkgdown` site, 
#' Turn each dataset into a link to `help(dataset)`
add_links <- function(dsets, 
                      sep = "; ") {
  names <- stringr::str_split_1(dsets, sep)
  names <- glue::glue("[{names}](help({names}))")
  glue::glue_collapse(names, sep = sep)
}

# add_links("Bartlett; Fungicide")

tag_dset |>
  dplyr::select(-tag) |>
  dplyr::mutate(datasets = purrr::map(datasets, add_links)) |>
  knitr::kable()

But AFAICS, this won't work in an ordinary vignette; I think CRAN will object loudly.
It should work when pkgdown turns it into an article. Where can I put this in my package?

For anyone interested, here is the source of my current Rmd file:

This topic was automatically closed after 45 days. New replies are no longer allowed.


If you have a query related to it or one of the replies, start a new topic and refer back with a link.