How do I use lapply to sum one column per data frame and add results to an existing .csv?

jovenden · May 5, 2022, 5:07am

library(tidyverse)
library(dplyr)

# Input test data

Component2 <- data.frame(
        stringsAsFactors = FALSE,
  Component_Product_Code = c("Component6",
                             "Component7","Component8","Component9","Component22"),
                Quantity = c(1L, 2L, 4L, 28L, 90L),
        Wastage_Quantity = c(0L, 0L, 0L, 0L, 0L),
               Unit_Cost = c(127.2, 112.02, 23.33, 0.0342, 0.0453),
              Total_Cost = c(127.2, 224.04, 93.32, 0.9576, 4.077),
       Can_Auto_Assemble = c("Yes", "Yes", "Yes", "Yes", "Yes"),
    Can_Auto_Disassemble = c("No", "No", "No", "No", "No"),
             IsObsoleted = c("No", "No", "No", "No", "No"),
         Expense_Account = c(NA, NA, NA, 51120L, 51120L),
                Comments = c(NA, NA, NA, NA, NA)
)

Component9 <- data.frame(
        stringsAsFactors = FALSE,
  Component_Product_Code = c("Component11",
                             "Component12","Component13","Component14",
                             "Component15"),
                Quantity = c(5L, 5L, 2L, 1L, 1L),
        Wastage_Quantity = c(0L, 0L, 0L, 0L, 0L),
               Unit_Cost = c(107, 53, 49.3629, 55.3854, 134.186),
              Total_Cost = c(535, 265, 98.7258, 55.3854, 134.186),
       Can_Auto_Assemble = c("Yes", "Yes", "Yes", "Yes", "Yes"),
    Can_Auto_Disassemble = c("No", "No", "No", "No", "No"),
             IsObsoleted = c("No", "No", "No", "No", "No"),
         Expense_Account = c(51250L, 51244L, NA, NA, NA),
                Comments = c(NA, NA, NA, NA, NA)
)

Component14 <- data.frame(
        stringsAsFactors = FALSE,
  Component_Product_Code = c("Component16",
                             "Component17","Component18","Component19",
                             "Component20","Component21"),
                Quantity = c(68L, 4L, 2L, 34L, 2L, 34L),
        Wastage_Quantity = c(0L, 0L, 0L, 0L, 0L, 0L),
               Unit_Cost = c(0.036, 0.2768, 0.0255, 0.0645, 0.6502, 0.16),
              Total_Cost = c(2.448, 1.1072, 0.051, 2.193, 1.3004, 5.44),
       Can_Auto_Assemble = c("No", "No", "No", "No", "No", "No"),
    Can_Auto_Disassemble = c("No", "No", "No", "No", "No", "No"),
             IsObsoleted = c("No", "No", "No", "No", "No", "No"),
         Expense_Account = c(51120L, NA, 51120L, 51120L, NA, 51120L),
                Comments = c(NA, NA, NA, NA, NA, NA)
)

Updated_price_list <- data.frame(
        stringsAsFactors = FALSE,
  Assembled_Product_Code = c("Component1", "Component3", "Component10"),
              Total_Cost = c(13.5682, 12.5396, 300.4532)
)
# I have three data.frames. For each one, I need to get the sum of column Total_Cost and add to an existing csv.
# As an example, the .csv file is provided here as a data.frame.
# Many thanks in advance for your assistance.

# Placing the data frames into a list.
List <- list(Component14, Component2, Component9)

# Making a function to sum the Total_Cost column.
TotalCost <- function(x){
  sum(Total_Cost)
}

# Using lapply to make a list of sums for each data frame.
# However, I get the error message "Error in FUN(X[[i]], ...) : object 'Total_Cost' not found"
Cost_sum <- (lapply(List, TotalCost))

# Adding the sums to the existing .csv file
write.table(Cost_sum, file = "Updated_price_list.csv", sep = ",",
            append = TRUE, quote = FALSE,
            col.names = FALSE, row.names = FALSE)

DavoWW · May 5, 2022, 5:25am

Hi @jovenden,
Is this happening because you mix-up "TotalCost" and "Total_Cost"?
HTH

yifanliu · May 5, 2022, 5:32am

Did you notice that the param x didn't appear in the function body?

modify the function TotalCost into this and it may works:

TotalCost <- function(x){
  sum(x$Total_Cost)
}

> lapply(List, TotalCost)
[[1]]
[1] 12.5396

[[2]]
[1] 449.5946

[[3]]
[1] 1088.297

the output of lapply is a list which is far from the data structure inside Updated_price_list, you shall transform it later to make it able to be written.

jovenden · May 6, 2022, 12:48am

Many thanks for the help with the function, yifanlu.

Do you have some advice to transform the list into a form that can be added to my 'Updated_price-list' table?

Best, Jenny

andresrcs · May 6, 2022, 1:24am

This is a different approach you can take to go from the input csv files to the output csv file directly

library(tidyverse)

Components <- read_csv(list.files(path = "path\to\your\files",
                                  pattern = "\\.csv$",
                                  full.names = TRUE),
                       id = "Assembled_Product_Code")

Components %>% 
    mutate(Assembled_Product_Code = str_extract(Assembled_Product_Code,
                                                "Component\\d{1,2}")) %>% 
    group_by(Assembled_Product_Code) %>% 
    summarise(Total_Cost = sum(Total_Cost)) %>%
    write.table(file = "Updated_price_list.csv",
                sep = ",",
                append = TRUE,
                quote = FALSE,
                col.names = FALSE,
                row.names = FALSE)

jovenden · May 6, 2022, 1:55am

Very kind of you to give me an alternative solution. Many thanks. Best, Jenny

yifanliu · May 6, 2022, 5:04am

andresrcs's solution is a highly efficient one based on tidy-R empowerment. Here I’m going to provide a solution to continue your previous base-R grammar:

Cost_sum <- data.frame(
    Assembled_Product_Code = c('Component14', 'Component2', 'Component9'),
    Total_Cost = do.call(rbind,lapply(List, TotalCost))
)

Updated_price_list <- rbind(
  Updated_price_list,
  Cost_sum
)
Updated_price_list
  Assembled_Product_Code Total_Cost
1             Component1    13.5682
2             Component3    12.5396
3            Component10   300.4532
4            Component14    12.5396
5             Component2   449.5946
6             Component9  1088.2972

By the way, I was a little confused by names of the three data.frames(Component2, Component9 and Component14), are they equal to those elements in the Component_Product_Code columns?

jovenden · May 6, 2022, 5:30am

Hi Yifanliu - I understand your confusion over the variable names in the columns and the names of the data.frames. You have interpreted the data correctly. Unfortunately, the problem arises from the type of data that I am dealing with. Thanks for the new code. Its very useful. Best wishes, Jenny

system · May 27, 2022, 5:31am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.