How to exclude certain file names in a folder in R

I have a question about reading certain files in a folder. For example, there are files in the folder "wind_grids", with file names as wind_lat1_lon1, wind_lat1_lon2, wind_lat1_lon3, ..., wind_lat2_lon1, wind_lat2_lon2, wind_lat2_lon3, .... Now I have another excel file with two columns: longitude, latitude. I want to exclude these coordinates in the excel file that appear in the wind_grids folder, because these grids are not necessary to show up. How to realize this? Thanks for your help.

wd.files = list.files('wind_grids')
for(i in 1:length(wd.files)){
wd.file.a = read.table(paste('directory/wind_grids',wd.files[i],sep='/'), head=F,col.names='wind')
}

Could anyone help me with this? Thanks in advance.

I think this can be done by applying string manipulations to the list of files, but we would need some sample data to provide a practical example, could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

EDIT: The solution I have in mind would be similar to this

library(stringr)
set.seed(1)
list <- paste0(sample(letters, 10,), sample(1:100, 10))
list
#>  [1] "g21" "j18" "n68" "u38" "e74" "s48" "w98" "m93" "l35" "b71"
exclude <- sample(list, 3)
exclude
#> [1] "b71" "j18" "s48"
list[str_detect(list, exclude, negate = TRUE)]
#> length
#> [1] "g21" "n68" "u38" "e74" "w98" "m93" "l35"

Created on 2019-04-15 by the reprex package (v0.2.1.9000)

2 Likes

For example, in 'wind_grids' directory, there are files named wind_39.9375_-105.3125, wind_39.9375_ -105.4375, wind_40.0625_-105.3125, wind_40.0625_-105.4375, etc.

In the excel file, either df.xls or df.csv file, there are two columns:
lon lat
-105.3125 39.9375
-105.3125 40.0625
...

How to exclude the wind_lat_lon in the excel file from the 'wind_grids' folder? Thanks for your help.

Same way as suggested by @andresrcs (not tested):

library(stringr)

wd.files = list.files('wind_grids')
df1 <- read.table("myExcludeLonLat.txt")

#Assuming we have two columns, lon lat:
exclude <- paste("wind", df1$lon, df1$lat, sep = "_")

wd.files[ str_detect(wd.files, exclude, negate = TRUE) ]

I tried it, but got the wrong message:
Error in str_detect(wd.files, exclude, negate = TRUE) :
unused argument (negate = TRUE)

So how to selectively get the wd_lat_lon files when reading the wd.files folder in the following loop? Thanks.

for(i in 1:length(wd.files)){
wd.file.a = read.table(paste('directory/wind_grids',wd.files[i],sep='/'), head=F,col.names='wind')
}

Here is an example for filtering your file list

wd.files <- c("wind_39.9375_-105.3125"," wind_39.9375_-105.4375",
              "wind_40.0625_-105.3125", "wind_40.0625_-105.4375")

excel_df <- data.frame(lon = c(-105.3125, -105.3125),
                       lat = c(39.9375, 40.0625)) 
library(tidyverse)
library(stringr)
selected.files <- wd.files[str_detect(wd.files, paste(excel_df$lat, excel_df$lon, sep = "_", collapse = "|"), negate = TRUE)]
selected.files
#> [1] " wind_39.9375_-105.4375" "wind_40.0625_-105.4375"

And then you can read the data with something like this

df <- selected.files %>%
  setNames(nm = .) %>% 
  map_df(read.table, header = F, col.names='wind', .id = "file_name")

Thanks, but there is something wrong with it. I got the error below:

Error in str_detect(wd.files, paste("wind", excel_df$lat, excel_df$lon, :
unused argument (negate = TRUE)

Where could the problem be? Thanks.
I checked that str_detect(string, pattern) has no statement such as "negate = TRUE", so may be I have a different version than yours?

Try updating stringr, maybe you just have an older version.

install.packages("stringr")

Thanks, the sample data works. However, there are a lager number of files in the "wd.files" folder, and some have names like "wind_40.1250_-105.5000", "wind_41.0500_-106.5000", etc. And in the "excel_df" data frame, the coordinates are 40.125, -105.5, etc., so is there a way to keep four decimal spaces when reading the excel_df? This way, the str_detect function can recognize all corresponding string names. Thanks again.

You can read them as strings using

read_xlsx("path_to_file.xlsx", col_types="text")

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.