Referencing a column in a function using a passed variable.

AnnOminous · July 21, 2019, 11:28pm

I hope I'm able to explain this clearly. I'm trying to write a function that adds rows to a data frame. The data frame consists of four variables. The name of the first variable is dependent on what I'm trying to analyze, and I want to be able to pass the name to be used for the first column as a variable in the function call, so that I can use the function for multiple comparisons. The other three column names in the data frame are always the same, so I'm not having any problem with them.

Here's the function code:

create_df <- function(dest_frame, x_var){
    dest_frame = tibble(x_var = double(),
                  count = integer(),
                  mean_price = double(),
                  median_price = double())
    for(i in seq_along(df_read)) {
    dest_frame <- add_row(dest_frame,
                        x_var <- df_read[[i]][1],
                        count = df_read[[i]][2],
                        mean_price = df_read[[i]][3],
                        median_price = df_read[[i]][4])
        }
    dest_frame <- dest_frame %>% filter(mean_price > 3)
    }

The problem I'm having is that however I try to pass the desired name of the first column, I get the error message:

New rows in `add_row()` must use columns that already exist:
* Can't find column `x_var <- df_read[[i]][1]` in `.data`.Traceback:

I've tried calling the function as create_df(data_frame_name, col_name) and create_df(data_frame_name, "col_name"), but either way I get the same message. I've also tried to reference the passed value multiple ways in the function itself, but it's always the same result.

Is there a way that I can do this?

pieterjanvc · July 22, 2019, 12:44am

Hi,

It seems you're making things very complicated, but then again it might be the I'm missing something. You seem to have a data frame in your funtion (df_read) that's never been declared. Could you provide us with a before and after example of the dataframe. So a few rows of the input data frame (dest_frame) and a few of the final result.

Here is some code that might help meanwhile:

library(tidyverse)

myData  = tibble(x = 1:10, y = 1:10, z = 1:10)
x_var = "newColName"
colnames(myData)[1] = x_var
head(myData)

# A tibble: 6 x 3
  newColName     y     z
       <int> <int> <int>
1          1     1     1
2          2     2     2
3          3     3     3
4          4     4     4
5          5     5     5
6          6     6     6

Or if you want to add a column:

library(tidyverse)
library(dplyr)

myData  = tibble(x = 1:10, y = 1:10, z = 1:10)
x_var = "newColName"
myData  = myData %>% add_column(!!x_var := 1:10, .before = T)
head(myData)

# A tibble: 10 x 4
   newColName     x     y     z
        <int> <int> <int> <int>
 1          1     1     1     1
 2          2     2     2     2
 3          3     3     3     3
 4          4     4     4     4
 5          5     5     5     5
 6          6     6     6     6
 7          7     7     7     7
 8          8     8     8     8
 9          9     9     9     9
10         10    10    10    10

Grtz,
PJ

AnnOminous · July 22, 2019, 1:26am

Pieter, thanks very much. You're not missing anything - I'm very new to R, and I'm taking an online course that really doesn't tell you much at all. Understanding how to reference components of a data frame is definitely eluding me, and the course is no help in that regard, so it's pretty much a guarantee that I'm overcomplicating the process.

That being said, here's all of my currently working code:

carats <- pull(diamonds %>% distinct(carat) %>% arrange(carat))
depth <- pull(diamonds %>% distinct(depth) %>% arrange(depth))

df_depth = tibble(depth = double(),
                  count = integer(),
                  mean_price = double(),
                  median_price = double(),
                  mode_price = double())

df_carat = tibble(carat = double(),
                  count = integer(),
                  mean_price = double(),
                  median_price = double(),
                  mode_price = double())

get_price_by_category <- function(dataset, col_name, x_var, y_var) {
    pricemean <- dataset %>% filter(col_name == x_var) %>% select(y_var)
    category_count <- length(str_c(pricemean[[1]], sep = ", "))
    price_mean <- mean(pricemean[[1]], sep = ", ")
    price_median <- median(pricemean[[1]], sep = ", ")
    price_mode <- max(mfv(pricemean[[1]], sep = ", "))
    results_vector <- c(x_var, category_count, price_mean, price_median, price_mode)
    return(str_c(results_vector, sep = ","))
}

depth %>% map(get_price_by_category, dataset = diamonds, col_name = diamonds[5], y_var = "price") %>%
write.csv("MeanPriceByDepth.csv", quote = FALSE, eol = "\n")
df_read <- read.csv("MeanPriceByDepth.csv", header = T)
names(df_read) <- substring(names(df_read), 4,7)

for(i in seq_along(df_read)) {
    df_depth <- add_row(df_depth,
                        depth = df_read[[i]][1],
                        count = df_read[[i]][2],
                        mean_price = df_read[[i]][3],
                        median_price = df_read[[i]][4],
                        mode_price = df_read[[i]][5])
        }
    df_depth <- df_depth %>% filter(mean_price > 3)


carats %>% map(get_price_by_category, dataset = diamonds, col_name = diamonds[1], y_var = "price" )%>% 
write.csv("MeanPriceByCarat.csv", quote = FALSE, eol = "\n")
df_read <- read.csv("MeanPriceByCarat.csv", header = T)
names(df_read) <- substring(names(df_read), 4,7)

for(i in seq_along(df_read)) {
    df_carat <- add_row(df_carat,
                        carat = df_read[[i]][1],
                        count = df_read[[i]][2],
                        mean_price = df_read[[i]][3],
                        median_price = df_read[[i]][4],
                        mode_price = df_read[[i]][5])
        }
    df_carat <- df_carat %>% filter(mean_price > 3)

plot_central_values <- function(the_dataframe, x_label, scale_factor) {
    the_dataframe %>% ggplot(aes(the_dataframe[[1]])) +
geom_line(aes(y = mean_price), color = "dark green") +
geom_line(aes(y = median_price), color = "dark blue") +
geom_line(aes(y = mode_price), color = "yellow") +
geom_line(aes(y = count * scale_factor), color = "dark red") +
scale_y_continuous(sec.axis = sec_axis(~./scale_factor, name = "Count")) +
labs(x = x_label)}

You'll notice that I am using the creation of the tibbles/data frames, the get_price_by_category function calls, and other code multiple times. I'd like to be able to clean that up, and create "dynamic" functions, which is why I'm trying to figure out how to do what I'm trying to do in my question.

So, df_carat and df_depth are initially empty frames. They get populated when I pipe the depth and carats vectors into the get_price_by_category function, output the results of that call to a write.csv call, read the contents of that file back out using read.csv, and then use the for loops to transpose the rows and columns of the data that was read back in with columns and rows of the df_carat and df_depth. And if you're thinking that that seems like an extremely convoluted way to just swap the rows and columns from one data frame to another, I'm sure it is, but I couldn't find any other way to do it.

What I'm trying to do is to turn this code into a function, with a passed variable for the first column in the list:

for(i in seq_along(df_read)) {
    df_carat <- add_row(df_carat,
                        carat = df_read[[i]][1],
                        count = df_read[[i]][2],
                        mean_price = df_read[[i]][3],
                        median_price = df_read[[i]][4],
                        mode_price = df_read[[i]][5])
        }
    df_carat <- df_carat %>% filter(mean_price > 3)

To work like this:

my_func <- function(new_frame, col_name){
for(i in seq_along(df_read)) {
    new_frame <- add_row(new_frame,
                        col_name = df_read[[i]][1],
                        count = df_read[[i]][2],
                        mean_price = df_read[[i]][3],
                        median_price = df_read[[i]][4],
                        mode_price = df_read[[i]][5])
        }
    new_frame <- frame %>% filter(mean_price > 3)
}

Anyway, hopefully this will clarify what I'm trying to do, and why I'm getting the error message I'm getting.

Thanks.

AnnOminous · July 23, 2019, 2:09am

Okay, I'm getting closer. Turns out part of what's been eluding me definitely should not have been. I was using the actual code I pasted into my first response:

df_create <- function(new_frame, col_name){
for(i in seq_along(df_read)) {
    new_frame <- add_row(new_frame,
                        col_name = df_read[[i]][1],
                        count = df_read[[i]][2],
                        mean_price = df_read[[i]][3],
                        median_price = df_read[[i]][4],
                        mode_price = df_read[[i]][5])
        }
    new_frame <- frame %>% filter(mean_price > 3)
}

And that was giving me the errors I was seeing. I made two changes to the function code:

Moved the code to "declare" the tibble/dataframe into the function
Added a return() statement (and this is the part that should not have eluded me)

create_df <- function(new_frame, first_col){
    new_frame = tibble(first_col = double(),
                  count = integer(),
                  mean_price = double(),
                  median_price = double(),
                  mode_price = double())
    for(i in seq_along(df_read)) {
    new_frame <- add_row(new_frame,
                        first_col = df_read[[i]][1],
                        count = df_read[[i]][2],
                        mean_price = df_read[[i]][3],
                        median_price = df_read[[i]][4],
                        mode_price = df_read[[i]][5])
        }
    new_frame <- new_frame %>% filter(mean_price > 3)
    return(new_frame)
}

Calling this function like this:

create(df_depth, depth)
df_depth

Gives the following results:

So I am at least getting the tibble/dataframe I want, but I still can't figure out how to name that first column by passing the desired name as a variable. If anyone has any ideas, that would be greatly appreciated. Thanks!

system · August 13, 2019, 2:13am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.