I have a situation where I need to process different files. There is an "ID" field in all the files, but sometimes it is named differently depending on the file. For example, it can be called "ID1" or "ID_1", etc.
I want the user to be able to specify the name of the "ID" column and then use that in my R script.
Here is what I have so far, which doesn't work.
id_col <- "ID_1"
bad_length <- file %>%
filter(str_length(.$id_col) != 10)
How do I reference id_col
in str_length()
? I tried using .$(!!id_col)
as the argument to str_length()
but that didn't work.
Thanks!
Using 'dplyr` with your files in dataframes or tibbles:
new_df <- old_df %>% mutate(ID = ID_1)
Then you can inner_join on ID, use select(-duplicate_columns) and get where I think you're trying to go.
joels
December 6, 2018, 5:41pm
3
You could write a function using quosures to flexibly specify the column name. For example:
library(tidyverse)
fnc = function(data, id_col, filter.length=10) {
id_col = enquo(id_col)
data %>%
filter(str_length(!!id_col) != filter.length)
}
fnc(iris, Species)
fnc(diamonds, cut, 9)
If you also want to always rename the id column to "ID", you could do this:
fnc = function(data, id_col, filter.length=10) {
id_col = enquo(id_col)
data %>%
rename(ID=!!id_col) %>%
filter(str_length(ID) != filter.length)
}
5 Likes
Thanks for your reply! Either of these solutions work as well
bad_length <- file %>%
filter(str_length(get(id_col)) != 10)
bad_length <- file %>%
filter(str_length(!!sym(id_col)) != 10)
r, dplyr
system
Closed
December 13, 2018, 11:50pm
5
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.