Pieter, thanks very much. You're not missing anything - I'm very new to R, and I'm taking an online course that really doesn't tell you much at all. Understanding how to reference components of a data frame is definitely eluding me, and the course is no help in that regard, so it's pretty much a guarantee that I'm overcomplicating the process.
That being said, here's all of my currently working code:
carats <- pull(diamonds %>% distinct(carat) %>% arrange(carat))
depth <- pull(diamonds %>% distinct(depth) %>% arrange(depth))
df_depth = tibble(depth = double(),
count = integer(),
mean_price = double(),
median_price = double(),
mode_price = double())
df_carat = tibble(carat = double(),
count = integer(),
mean_price = double(),
median_price = double(),
mode_price = double())
get_price_by_category <- function(dataset, col_name, x_var, y_var) {
pricemean <- dataset %>% filter(col_name == x_var) %>% select(y_var)
category_count <- length(str_c(pricemean[[1]], sep = ", "))
price_mean <- mean(pricemean[[1]], sep = ", ")
price_median <- median(pricemean[[1]], sep = ", ")
price_mode <- max(mfv(pricemean[[1]], sep = ", "))
results_vector <- c(x_var, category_count, price_mean, price_median, price_mode)
return(str_c(results_vector, sep = ","))
}
depth %>% map(get_price_by_category, dataset = diamonds, col_name = diamonds[5], y_var = "price") %>%
write.csv("MeanPriceByDepth.csv", quote = FALSE, eol = "\n")
df_read <- read.csv("MeanPriceByDepth.csv", header = T)
names(df_read) <- substring(names(df_read), 4,7)
for(i in seq_along(df_read)) {
df_depth <- add_row(df_depth,
depth = df_read[[i]][1],
count = df_read[[i]][2],
mean_price = df_read[[i]][3],
median_price = df_read[[i]][4],
mode_price = df_read[[i]][5])
}
df_depth <- df_depth %>% filter(mean_price > 3)
carats %>% map(get_price_by_category, dataset = diamonds, col_name = diamonds[1], y_var = "price" )%>%
write.csv("MeanPriceByCarat.csv", quote = FALSE, eol = "\n")
df_read <- read.csv("MeanPriceByCarat.csv", header = T)
names(df_read) <- substring(names(df_read), 4,7)
for(i in seq_along(df_read)) {
df_carat <- add_row(df_carat,
carat = df_read[[i]][1],
count = df_read[[i]][2],
mean_price = df_read[[i]][3],
median_price = df_read[[i]][4],
mode_price = df_read[[i]][5])
}
df_carat <- df_carat %>% filter(mean_price > 3)
plot_central_values <- function(the_dataframe, x_label, scale_factor) {
the_dataframe %>% ggplot(aes(the_dataframe[[1]])) +
geom_line(aes(y = mean_price), color = "dark green") +
geom_line(aes(y = median_price), color = "dark blue") +
geom_line(aes(y = mode_price), color = "yellow") +
geom_line(aes(y = count * scale_factor), color = "dark red") +
scale_y_continuous(sec.axis = sec_axis(~./scale_factor, name = "Count")) +
labs(x = x_label)}
You'll notice that I am using the creation of the tibbles/data frames, the get_price_by_category
function calls, and other code multiple times. I'd like to be able to clean that up, and create "dynamic" functions, which is why I'm trying to figure out how to do what I'm trying to do in my question.
So, df_carat
and df_depth
are initially empty frames. They get populated when I pipe the depth
and carats
vectors into the get_price_by_category
function, output the results of that call to a write.csv
call, read the contents of that file back out using read.csv
, and then use the for loops to transpose the rows and columns of the data that was read back in with columns and rows of the df_carat
and df_depth
. And if you're thinking that that seems like an extremely convoluted way to just swap the rows and columns from one data frame to another, I'm sure it is, but I couldn't find any other way to do it.
What I'm trying to do is to turn this code into a function, with a passed variable for the first column in the list:
for(i in seq_along(df_read)) {
df_carat <- add_row(df_carat,
carat = df_read[[i]][1],
count = df_read[[i]][2],
mean_price = df_read[[i]][3],
median_price = df_read[[i]][4],
mode_price = df_read[[i]][5])
}
df_carat <- df_carat %>% filter(mean_price > 3)
To work like this:
my_func <- function(new_frame, col_name){
for(i in seq_along(df_read)) {
new_frame <- add_row(new_frame,
col_name = df_read[[i]][1],
count = df_read[[i]][2],
mean_price = df_read[[i]][3],
median_price = df_read[[i]][4],
mode_price = df_read[[i]][5])
}
new_frame <- frame %>% filter(mean_price > 3)
}
Anyway, hopefully this will clarify what I'm trying to do, and why I'm getting the error message I'm getting.
Thanks.