Function with Loop

I have the following Code:

Total.Grains <- psd.all.data %>% 
    filter(Commodity_Code %in% list_all[[3]][[1]]) %>% 
    mutate(Commodity_Code = "Total Grains") %>% 
    group_by(Commodity_Code,Country_Code,Country_Name,Market_Year,Calendar_Year,Month,Attribute_ID,Attribute_Description,Unit_ID,Unit_Description) %>%
    summarise(Value = sum(Value)) %>% 
    bind_rows(psd.all.data,.)
    #Here we will change the NA values to Total.Grains. 
    Total.Grains$Commodity_Description[is.na(Total.Grains$Commodity_Description)] <- "Total Grains"

The above code creates an aggregate for Total Grains based on specific commodity codes given in the list_all

I can run the second code to create another aggregate for Total Oilseeds

Total.Oilseeds <- psd.all.data %>% 
    filter(Commodity_Code %in% list_all[[4]][[1]]) %>% 
    mutate(Commodity_Code = "Total Oilseeds") %>% 
    group_by(Commodity_Code,Country_Code,Country_Name,Market_Year,Calendar_Year,Month,Attribute_ID,Attribute_Description,Unit_ID,Unit_Description) %>%
    summarise(Value = sum(Value)) %>% 
    bind_rows(psd.all.data,.)
    #Here we will change the NA values to Total.Oilseeds. 
    Total.Oilseeds$Commodity_Description[is.na(Total.Oilseeds$Commodity_Description)] <- "Total Oilseeds"

I tried creating the following function so that the aggregates can be done with various commodities without having to copy paste it. I have the following function & forloop.

Run.Aggregation<- function(Aggregate, Table)
{for (i in seq(3, 10, by=1)) {
 Table <- psd.all.data %>% 
    filter(Commodity_Code %in% list_all[[i]][[1]]) %>% 
    mutate(Commodity_Code = Aggregate) %>% 
    group_by(Commodity_Code,Country_Code,Country_Name,Market_Year,Calendar_Year,Month,Attribute_ID,Attribute_Description,Unit_ID,Unit_Description) %>%
    summarise(Value = sum(Value)) %>% 
    bind_rows(psd.all.data,.)
    Table$Commodity_Description[is.na(Table$Commodity_Description)] <- Aggregate
}}

Run.Aggregation(Aggregate = "Total Grains", Table = Total.Grains)
Run.Aggregation(Aggregate = "Total Oilseeds" Table = Total.Oilseeds)

But I'm getting the following error. I feel like i'm either writing the function and the for loop incorrectly or trying to resolve it incorrectly.

Error: $ operator is invalid for atomic vectors

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? without a subset of the data as example, we can't easily reproduce and show examples.

Please have a look at this guide, to see how to create one:

I don't think this requires a data set. I'm merely trying to parameterize a couple of arguments.

I'm guessing you need to use some of the tidy evaluation operators in your function. You can read about programming with dplyr here and some additional explanation/background on tidy evaluation with rlang in this article.

Providing the requested example dataset will help folks who want to help you demonstrate with code.

Have in mind that there are other approaches that could be better than the one you are using right now, but to give you a concrete solution we would need sample data, and also is a polite thing to do when asking other people to take a look into your problems, because it makes things easier for people trying to help.

1 Like

Hi,

While I do agree with @cderv and @andresrcs that a reprex (with data) is the best way of gettng help and that refusing to do so is not polite when asking other people to use their time on your problem I will make a few suggestions.

In your Run.Aggregation function you run the loop 8 times, each time constructing a new Table object. This will get overwritten for each iteration, so only the last one will survive after the loop. Further, you don't return a value from the function. I don't understand your logic here (and data would definitely have been helpful), the two examples above suggest that "Total Grains" correspond to index 3 of the first dimension of list_all while "Total Oilseeds" correspond to index 4. You call your function once for "Total Grains" and once for "Total Oilseeds", why would you need to loop? Further, it looks as you want to pass the name of the dataframe receiving the aggregated data in the Table parameter, but R doesn't work this way - a copy of eg Total.Grains will be passed the function (a bit simplified) and then this copy will be overwritten in each iteration of the loop. You cannot pass parameters by reference in R (AFAIK).

A suggestion:

Run.Aggregation<- function(Aggregate, Tableidx)
{
  Table <- psd.all.data %>% 
    filter(Commodity_Code %in% list_all[[Tableidx]][[1]]) %>% 
    mutate(Commodity_Code = Aggregate) %>% 
    group_by(Commodity_Code,Country_Code,Country_Name,Market_Year,Calendar_Year,Month,Attribute_ID,Attribute_Description,Unit_ID,Unit_Description) %>%
    summarise(Value = sum(Value)) %>% 
    bind_rows(psd.all.data,.)
  Table$Commodity_Description[is.na(Table$Commodity_Description)] <- Aggregate
  Table
}

Total.Grains = Run.Aggregation(Aggregate="Total Grains", Tableidx = 3)
Total.Oilseeds = Run.Aggregation(Aggregate = "Total Oilseeds", Tableidx = 4)

: Added Table as return value and LHS of the data wrangling statement. My first go did exactly what I warned you about: It didn't return a value :slight_smile:

Cheers
Steen

1 Like

@andresrcs
@cderv
@andresrcs...I did not mean to be impolite, and that was certainly not my intention. I genuienly didn't think that a table with data might be helpful when I initially asked the question and am happy to include it, but I just didn't have time until now (12:02 a.m. :slightly_smiling_face:

~Commodity_Code,   ~Commodity_Description, ~Country_Code, ~Country_Name, ~Market_Year, ~Calendar_Year, ~Month, ~Attribute_ID, ~Attribute_Description, ~Unit_ID, ~Unit_Description, ~Value,
      "0577400", "Almonds, Shelled Basis",          "AF", "Afghanistan",         2010,           2018,   "10",         "020",     "Beginning Stocks",       21,            "(MT)",      0,
      "0577400", "Almonds, Shelled Basis",          "AF", "Afghanistan",         2010,           2018,   "10",         "125", "Domestic Consumption",       21,            "(MT)",      0,
      "0577400", "Almonds, Shelled Basis",          "AF", "Afghanistan",         2010,           2018,   "10",         "176",        "Ending Stocks",       21,            "(MT)",      0,
      "0577400", "Almonds, Shelled Basis",          "AF", "Afghanistan",         2010,           2018,   "10",         "088",              "Exports",       21,            "(MT)",      0,
      "0577400", "Almonds, Shelled Basis",          "AF", "Afghanistan",         2010,           2018,   "10",         "057",              "Imports",       21,            "(MT)",      0,
      "0577400", "Almonds, Shelled Basis",          "AF", "Afghanistan",         2010,           2018,   "10",         "028",           "Production",       21,            "(MT)",      0
)

And here's what an aggregation for the Total.Grains will look like (Total.Grains aggregation is done by aggregation various Grain commodities (e.g. Sorghum, Corn, Barley) which and similar aggregation is done for other "Totals".

~Commodity_Code, ~Commodity_Description, ~Country_Code, ~Country_Name, ~Market_Year, ~Calendar_Year, ~Month, ~Attribute_ID,  ~Attribute_Description, ~Unit_ID, ~Unit_Description, ~Value,
 "Total Grains",         "Total Grains",          "ZA",      "Zambia",         2019,           2019,   "07",         "125",  "Domestic Consumption",        8,       "(1000 MT)",   2482,
 "Total Grains",         "Total Grains",          "ZA",      "Zambia",         2019,           2019,   "07",         "130", "Feed Dom. Consumption",        8,       "(1000 MT)",    453,
 "Total Grains",         "Total Grains",          "ZA",      "Zambia",         2019,           2019,   "07",         "176",         "Ending Stocks",        8,       "(1000 MT)",    290,
 "Total Grains",         "Total Grains",          "ZA",      "Zambia",         2019,           2019,   "07",         "178",    "Total Distribution",        8,       "(1000 MT)",   2922,
 "Total Grains",         "Total Grains",          "ZA",      "Zambia",         2019,           2019,   "07",         "184",                 "Yield",       26,         "(MT/HA)",   4.12,
 "Total Grains",         "Total Grains",          "ZA",      "Zambia",         2019,           2019,   "07",         "192",       "FSI Consumption",        8,       "(1000 MT)",   2029
)

If this doesn't suffice, then I can certainly add more information or create another dataset. Thanks for the help in advance...

My understanding was (and again, as a newby R programmer, it seems I was wrong) that I would be creating a new Table for different aggregate and each time I assign a different loop to correspond to different commodity codes in the list_all list. So for the 3 item on the list which is the Total Grains, I would get a table called Total.Grains, when it loops over the 4th one, I get another, and so on until the last loop.

The way you have written your Run.Aggregation function the Table object will be local to this function and not exist outside the function. I think (sorry if I'm wrong) that when you call the function with the line

Run.Aggregation(Aggregate = "Total Grains", Table = Total.Grains)

that you expect R to substitute the Table object in the Run.Aggregation function with the Total.Grains object, and that when the function completes that the Total.Grains object will now contain whatever was assigned to Table in the Run.Aggregation function. This would make the Table parameter a reference parameter, and R does not support that. So whatever changes you make to Table inside Run.Aggregation have no effect on objects outside this function.

Regardless, your loop will still execute 8 iterations, and only the result of the last iteration will be stored in Table. So even if you could pass parameters by reference it would still not work.

1 Like

The sample data you are providing is not complete enough to reproduce your code (since only has one commodity code), you are not providing a sample of the content of list_all and you are not providing the data on a copy/paste friendly format. Could you please turn this into a proper self-contained REPRoducible EXample (reprex)?

1 Like

I tried to follow the datapasta example and pasted the code here. I'm not sure what you mean by friendly format (Can you specify what that would be, I think the one I have is an HTML)? Also, I will try to provide a sample content of list_all once I get an idea of what you are referring to exactly.

Thanks.

P.S. I had used the same format in another question of mine and I wasn't told it wasn't correct.

I mean a format we can easily copy into our own R session and use without any additional processing required, like this for example.

data.frame(
   Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4),
    Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9),
   Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7),
    Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.4),
        Species = as.factor(c("setosa", "setosa", "setosa", "setosa", "setosa",
                              "setosa"))
)

If I copy your sample data the way it is right now I just get this error

> )
Error: unexpected ')' in ")"

Because you missed this code at the beginning tibble::tribble(

Here's the code to create the Total.Grains and Total.Oilseeds list

list_a <- list("Commodity Code" = c("0440000", "0451000", "0452000", "0459100", "0459200", "0459900", "0410000", "0422110"), "Commodity Name" = c("Corn", "Rye", "Oats", "Millet", "Sorghum", "Mixed Grains", "Wheat", "Rice"))
             
             
list_b <- list("Commodity Code" = c("2231000", "2223000", "2232000", "2221000", "2226000", "2222000", "2224000"), "Commodity Name" = c("Oilseed, Copra", "Oilseed, Cottonseed", "Oilseed, Palm Kernel", "Oilseed, Peanut", "Oilseed, Rapeseed", "Oilseed, Soybean", "Oilseed, Sunflowerseed"))

names(list_all_sample) <- c("Total_Grains", "Total_Oilseeds")  

# And here is a sample data:

Sample_Data <- tibble::tribble(
                    ~Commodity_Code, ~Commodity_Description, ~Country_Code, ~Country_Name, ~Market_Year, ~Calendar_Year, ~Month, ~Attribute_ID, ~Attribute_Description, ~Unit_ID, ~Unit_Description, ~Value,
                          "0440000",                 "Corn",          "BL",     "Bolivia",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",    293,
                          "0440000",                 "Corn",          "BL",     "Bolivia",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",     20,
                          "0440000",                 "Corn",          "BR",      "Brazil",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",  12810,
                          "0440000",                 "Corn",          "BR",      "Brazil",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",   1100,
                          "2223000",  "Oilseed, Cottonseed",          "BR",      "Brazil",         1980,           2006,   "06",         "004",       "Area Harvested",        4,       "(1000 HA)",   2015,
                          "2223000",  "Oilseed, Cottonseed",          "BR",      "Brazil",         1980,           2006,   "06",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
                          "2221000",      "Oilseed, Peanut",          "BR",      "Brazil",         1980,           2006,   "06",         "004",       "Area Harvested",        4,       "(1000 HA)",    235,
                          "2221000",      "Oilseed, Peanut",          "BR",      "Brazil",         1980,           2006,   "06",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
                          "0459200",              "Sorghum",          "BL",     "Bolivia",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",      6,
                          "0459200",              "Sorghum",          "BL",     "Bolivia",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
                          "0459200",              "Sorghum",          "BR",      "Brazil",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",     92,
                          "0459200",              "Sorghum",          "BR",      "Brazil",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
                     "Total Grains",         "Total Grains",          "BL",     "Bolivia",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",    299,
                     "Total Grains",         "Total Grains",          "BR",      "Brazil",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",     20,
                     "Total Grains",         "Total Grains",          "BL",     "Bolivia",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",  12902,
                     "Total Grains",         "Total Grains",          "BR",      "Brazil",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",   1100,
                   "Total Oilseeds",       "Total Oilseeds",          "BR",      "Brazil",         1980,           2006,   "06",         "020",     "Beginning Stocks",        8,       "(1000 MT)",   2250,
                   "Total Oilseeds",       "Total Oilseeds",          "BR",      "Brazil",         1980,           2006,   "06",         "004",       "Area Harvested",        4,       "(1000 HA)",      0
                   )

Does this suffice?

You are still not telling us what list_all_sample is, but based on the information you are providing, I think that performing aggregation over your dataset would be better than using loops, see this example:

list_a <- list("Commodity Code" = c("0440000", "0451000", "0452000", "0459100", "0459200", "0459900", "0410000", "0422110"), "Commodity Name" = c("Corn", "Rye", "Oats", "Millet", "Sorghum", "Mixed Grains", "Wheat", "Rice"))
list_b <- list("Commodity Code" = c("2231000", "2223000", "2232000", "2221000", "2226000", "2222000", "2224000"), "Commodity Name" = c("Oilseed, Copra", "Oilseed, Cottonseed", "Oilseed, Palm Kernel", "Oilseed, Peanut", "Oilseed, Rapeseed", "Oilseed, Soybean", "Oilseed, Sunflowerseed"))

Sample_Data <- tibble::tribble(
    ~Commodity_Code, ~Commodity_Description, ~Country_Code, ~Country_Name, ~Market_Year, ~Calendar_Year, ~Month, ~Attribute_ID, ~Attribute_Description, ~Unit_ID, ~Unit_Description, ~Value,
    "0440000",                 "Corn",          "BL",     "Bolivia",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",    293,
    "0440000",                 "Corn",          "BL",     "Bolivia",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",     20,
    "0440000",                 "Corn",          "BR",      "Brazil",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",  12810,
    "0440000",                 "Corn",          "BR",      "Brazil",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",   1100,
    "2223000",  "Oilseed, Cottonseed",          "BR",      "Brazil",         1980,           2006,   "06",         "004",       "Area Harvested",        4,       "(1000 HA)",   2015,
    "2223000",  "Oilseed, Cottonseed",          "BR",      "Brazil",         1980,           2006,   "06",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
    "2221000",      "Oilseed, Peanut",          "BR",      "Brazil",         1980,           2006,   "06",         "004",       "Area Harvested",        4,       "(1000 HA)",    235,
    "2221000",      "Oilseed, Peanut",          "BR",      "Brazil",         1980,           2006,   "06",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
    "0459200",              "Sorghum",          "BL",     "Bolivia",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",      6,
    "0459200",              "Sorghum",          "BL",     "Bolivia",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
    "0459200",              "Sorghum",          "BR",      "Brazil",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",     92,
    "0459200",              "Sorghum",          "BR",      "Brazil",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",      0,
    "Total Grains",         "Total Grains",          "BL",     "Bolivia",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",    299,
    "Total Grains",         "Total Grains",          "BR",      "Brazil",         1980,           2006,   "07",         "004",       "Area Harvested",        4,       "(1000 HA)",     20,
    "Total Grains",         "Total Grains",          "BL",     "Bolivia",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",  12902,
    "Total Grains",         "Total Grains",          "BR",      "Brazil",         1980,           2006,   "07",         "020",     "Beginning Stocks",        8,       "(1000 MT)",   1100,
    "Total Oilseeds",       "Total Oilseeds",          "BR",      "Brazil",         1980,           2006,   "06",         "020",     "Beginning Stocks",        8,       "(1000 MT)",   2250,
    "Total Oilseeds",       "Total Oilseeds",          "BR",      "Brazil",         1980,           2006,   "06",         "004",       "Area Harvested",        4,       "(1000 HA)",      0
)

library(tidyverse)

Sample_Data %>%
    mutate(Commodity_Code = case_when(Commodity_Code %in% list_a[[1]] ~ "Total Grains",
                                      Commodity_Code %in% list_b[[1]] ~ "Total Oilseeds",
                                      TRUE ~ Commodity_Code)) %>% 
    group_by(Commodity_Code,Country_Code,Country_Name,Market_Year,Calendar_Year,Month,Attribute_ID,Attribute_Description,Unit_ID,Unit_Description) %>%
    summarise(Value = sum(Value))
#> # A tibble: 6 x 11
#> # Groups:   Commodity_Code, Country_Code, Country_Name, Market_Year,
#> #   Calendar_Year, Month, Attribute_ID, Attribute_Description, Unit_ID [6]
#>   Commodity_Code Country_Code Country_Name Market_Year Calendar_Year Month
#>   <chr>          <chr>        <chr>              <dbl>         <dbl> <chr>
#> 1 Total Grains   BL           Bolivia             1980          2006 07   
#> 2 Total Grains   BL           Bolivia             1980          2006 07   
#> 3 Total Grains   BR           Brazil              1980          2006 07   
#> 4 Total Grains   BR           Brazil              1980          2006 07   
#> 5 Total Oilseeds BR           Brazil              1980          2006 06   
#> 6 Total Oilseeds BR           Brazil              1980          2006 06   
#> # … with 5 more variables: Attribute_ID <chr>,
#> #   Attribute_Description <chr>, Unit_ID <dbl>, Unit_Description <chr>,
#> #   Value <dbl>

Created on 2019-08-09 by the reprex package (v0.3.0.9000)

@andresrcs I apologize, the list_all_sample is the same as list_all. I just created that as a sample for the purposes of this issue.

@andresrcs: That actually worked. Thanks a bunch. Now I just have to figure out a way to fill in the missing ("NA") values for certain columns.

Thanks.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.