How do I use lapply to sum one column per data frame and add results to an existing .csv?

library(tidyverse)
library(dplyr)

# Input test data

Component2 <- data.frame(
        stringsAsFactors = FALSE,
  Component_Product_Code = c("Component6",
                             "Component7","Component8","Component9","Component22"),
                Quantity = c(1L, 2L, 4L, 28L, 90L),
        Wastage_Quantity = c(0L, 0L, 0L, 0L, 0L),
               Unit_Cost = c(127.2, 112.02, 23.33, 0.0342, 0.0453),
              Total_Cost = c(127.2, 224.04, 93.32, 0.9576, 4.077),
       Can_Auto_Assemble = c("Yes", "Yes", "Yes", "Yes", "Yes"),
    Can_Auto_Disassemble = c("No", "No", "No", "No", "No"),
             IsObsoleted = c("No", "No", "No", "No", "No"),
         Expense_Account = c(NA, NA, NA, 51120L, 51120L),
                Comments = c(NA, NA, NA, NA, NA)
)

Component9 <- data.frame(
        stringsAsFactors = FALSE,
  Component_Product_Code = c("Component11",
                             "Component12","Component13","Component14",
                             "Component15"),
                Quantity = c(5L, 5L, 2L, 1L, 1L),
        Wastage_Quantity = c(0L, 0L, 0L, 0L, 0L),
               Unit_Cost = c(107, 53, 49.3629, 55.3854, 134.186),
              Total_Cost = c(535, 265, 98.7258, 55.3854, 134.186),
       Can_Auto_Assemble = c("Yes", "Yes", "Yes", "Yes", "Yes"),
    Can_Auto_Disassemble = c("No", "No", "No", "No", "No"),
             IsObsoleted = c("No", "No", "No", "No", "No"),
         Expense_Account = c(51250L, 51244L, NA, NA, NA),
                Comments = c(NA, NA, NA, NA, NA)
)

Component14 <- data.frame(
        stringsAsFactors = FALSE,
  Component_Product_Code = c("Component16",
                             "Component17","Component18","Component19",
                             "Component20","Component21"),
                Quantity = c(68L, 4L, 2L, 34L, 2L, 34L),
        Wastage_Quantity = c(0L, 0L, 0L, 0L, 0L, 0L),
               Unit_Cost = c(0.036, 0.2768, 0.0255, 0.0645, 0.6502, 0.16),
              Total_Cost = c(2.448, 1.1072, 0.051, 2.193, 1.3004, 5.44),
       Can_Auto_Assemble = c("No", "No", "No", "No", "No", "No"),
    Can_Auto_Disassemble = c("No", "No", "No", "No", "No", "No"),
             IsObsoleted = c("No", "No", "No", "No", "No", "No"),
         Expense_Account = c(51120L, NA, 51120L, 51120L, NA, 51120L),
                Comments = c(NA, NA, NA, NA, NA, NA)
)

Updated_price_list <- data.frame(
        stringsAsFactors = FALSE,
  Assembled_Product_Code = c("Component1", "Component3", "Component10"),
              Total_Cost = c(13.5682, 12.5396, 300.4532)
)
# I have three data.frames. For each one, I need to get the sum of column Total_Cost and add to an existing csv.
# As an example, the .csv file is provided here as a data.frame.
# Many thanks in advance for your assistance.

# Placing the data frames into a list.
List <- list(Component14, Component2, Component9)

# Making a function to sum the Total_Cost column.
TotalCost <- function(x){
  sum(Total_Cost)
}

# Using lapply to make a list of sums for each data frame.
# However, I get the error message "Error in FUN(X[[i]], ...) : object 'Total_Cost' not found"
Cost_sum <- (lapply(List, TotalCost))

# Adding the sums to the existing .csv file
write.table(Cost_sum, file = "Updated_price_list.csv", sep = ",",
            append = TRUE, quote = FALSE,
            col.names = FALSE, row.names = FALSE)

Hi @jovenden,
Is this happening because you mix-up "TotalCost" and "Total_Cost"?
HTH

1 Like

Did you notice that the param x didn't appear in the function body?

modify the function TotalCost into this and it may works:

TotalCost <- function(x){
  sum(x$Total_Cost)
}

> lapply(List, TotalCost)
[[1]]
[1] 12.5396

[[2]]
[1] 449.5946

[[3]]
[1] 1088.297

the output of lapply is a list which is far from the data structure inside Updated_price_list, you shall transform it later to make it able to be written.

1 Like

Many thanks for the help with the function, yifanlu.

Do you have some advice to transform the list into a form that can be added to my 'Updated_price-list' table?

Best, Jenny

This is a different approach you can take to go from the input csv files to the output csv file directly

library(tidyverse)

Components <- read_csv(list.files(path = "path\to\your\files",
                                  pattern = "\\.csv$",
                                  full.names = TRUE),
                       id = "Assembled_Product_Code")

Components %>% 
    mutate(Assembled_Product_Code = str_extract(Assembled_Product_Code,
                                                "Component\\d{1,2}")) %>% 
    group_by(Assembled_Product_Code) %>% 
    summarise(Total_Cost = sum(Total_Cost)) %>%
    write.table(file = "Updated_price_list.csv",
                sep = ",",
                append = TRUE,
                quote = FALSE,
                col.names = FALSE,
                row.names = FALSE)
1 Like

Very kind of you to give me an alternative solution. Many thanks. Best, Jenny

andresrcs's solution is a highly efficient one based on tidy-R empowerment. Here Iā€™m going to provide a solution to continue your previous base-R grammar:

Cost_sum <- data.frame(
    Assembled_Product_Code = c('Component14', 'Component2', 'Component9'),
    Total_Cost = do.call(rbind,lapply(List, TotalCost))
)

Updated_price_list <- rbind(
  Updated_price_list,
  Cost_sum
)
Updated_price_list
  Assembled_Product_Code Total_Cost
1             Component1    13.5682
2             Component3    12.5396
3            Component10   300.4532
4            Component14    12.5396
5             Component2   449.5946
6             Component9  1088.2972

By the way, I was a little confused by names of the three data.frames(Component2, Component9 and Component14), are they equal to those elements in the Component_Product_Code columns?

1 Like

Hi Yifanliu - I understand your confusion over the variable names in the columns and the names of the data.frames. You have interpreted the data correctly. Unfortunately, the problem arises from the type of data that I am dealing with. Thanks for the new code. Its very useful. Best wishes, Jenny

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.