I am working with the R programming language. I am trying to download the smallest file from this website (Directory Contents), i.e. https://files.pushshift.io/reddit/comments/RC_2005-12.zst . My goal is to import this file into R and then query this file to find comments containing certain terms. For example, I want to find every comment that contains the word "tacos".
I have downloaded this file on to my computer, now I would like to try and import this file into R. I have never heard or worked before with this file extension format. I tried to read on the internet how might I be able to import this file into R.
I did some reading online and found the following package : GitHub - thekvs/zstdr: R bindings to the zstandard compression library . However, it doesn't seem like I am able to install this package:
> install.packages('zstdr') Installing package into ‘C:/Users/me/OneDrive/Documents/R/win-library/4.1’ (as ‘lib’ is unspecified) Warning in install.packages : package ‘zstdr’ is not available for this version of R A version of this package for your version of R might be available elsewhere, see the ideas at https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Does anyone know how I can import this zst file into R and then query it for specific search terms (e.g. "basketball")?
Note 1: This is the error message I get when trying to install this same library from github
> devtools::install_github("thekvs/zstdr") Downloading GitHub repo thekvs/zstdr@HEAD These packages have more recent versions available. It is recommended to update all of them. Which would you like to update? 1: All 2: CRAN packages only 3: None 4: Rcpp (126.96.36.199 -> 1.0.9) [CRAN] Enter one or more numbers, or an empty line to skip updates: v checking for file 'C:\Users\me\AppData\Local\Temp\RtmpqumrUb\remotes710158629f3\thekvs-zstdr-f992e66/DESCRIPTION' (533ms) - preparing 'zstdr': (3.9s) v checking DESCRIPTION meta-information ... - cleaning src - checking for LF line-endings in source and make files and shell scripts - checking for empty or unneeded directories (629ms) Omitted 'LazyData' from DESCRIPTION - building 'zstdr_0.1.1.tar.gz' Warning: file 'zstdr/cleanup' did not have execute permissions: corrected Warning: file 'zstdr/configure' did not have execute permissions: corrected Installing package into ‘C:/Users/me/OneDrive/Documents/R/win-library/4.1’ (as ‘lib’ is unspecified) ERROR: Unix-only package * removing 'C:/Users/me/OneDrive/Documents/R/win-library/4.1/zstdr' Warning message: In i.p(...) : installation of package ‘C:/Users/me/AppData/Local/Temp/RtmpqumrUb/file710783f1c08/zstdr_0.1.1.tar.gz’ had non-zero exit status
Note 2: And this is my session info:
> sessionInfo() R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000) Matrix products: default locale:  LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 LC_NUMERIC=C LC_TIME=English_Canada.1252 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  htm2txt_2.2.2 dplyr_1.0.9 RedditExtractoR_2.1.5 loaded via a namespace (and not attached):  tinytex_0.40 tidyselect_1.1.2 xfun_0.30 remotes_2.4.2 purrr_0.3.4 vctrs_0.4.1 generics_0.1.3 testthat_3.1.4 usethis_2.1.6  htmltools_0.5.2 yaml_2.3.5 utf8_1.2.2 rlang_1.0.2 pkgbuild_1.3.1 pillar_1.7.0 glue_1.6.2 withr_2.5.0 DBI_1.1.3  sessioninfo_1.2.2 lifecycle_1.0.1 visNetwork_2.1.0 devtools_2.4.3 htmlwidgets_1.5.4 memoise_2.0.1 evaluate_0.15 knitr_1.39 callr_3.7.0  fastmap_1.1.0 ps_1.6.0 curl_4.3.2 fansi_1.0.3 cachem_1.0.6 desc_1.4.1 pkgload_1.2.4 jsonlite_1.8.0 fs_1.5.2  brio_1.1.3 digest_0.6.29 processx_3.5.3 RJSONIO_1.3-1.6 rprojroot_2.0.3 cli_3.3.0 tools_4.1.3 magrittr_2.0.2 tibble_3.1.6  crayon_1.5.1 pkgconfig_2.0.3 ellipsis_0.3.2 prettyunits_1.1.1 assertthat_0.2.1 rmarkdown_2.14 rstudioapi_0.13 R6_2.5.1 igraph_1.2.11  compiler_4.1.3