Sumifs and Group Function

Hi everyone! Im new to RStudio. I want to apply my formula from excel to use in Rstudio to give me a sum value of group data as seen in the first image. The values will be taken from a table containing Province of Origin Code column to add the specific Commodity (PSCC99) grouped with its corresponding quantity as shown image 1 that is already group column B. The sample formula is =SUMIFS(Quantity2000,ProvinceImport2000,39130,Product2000,ProductCategory)+SUMIFS(Quantity2000,ProvinceImport2000,39150,Product2000,ProductCategory)+SUMIFS(Quantity2000,ProvinceImport2000,39170,Product2000,ProductCategory)+SUMIFS(Quantity2000,ProvinceImport2000,39500,Product2000,ProductCategory)+SUMIFS(Quantity2000,ProvinceImport2000,39700,Product2000,ProductCategory)+SUMIFS(Quantity2000,ProvinceImport2000,39900,Product2000,ProductCategory).

As I am new, I cant attach another image for reference.

I hope anyone helps me transform it into a Rscript? Thank you. :slight_smile:

I have not used Excel much for many years and I'm struggling to match the parts of your formula with the image. If I remember correctly, this formula

SUMIFS(Quantity2000,ProvinceImport2000,39130,Product2000,ProductCategory)

means: SUM the values of Quantity2000 where ProvinceImport2000 is equal to 39130 and Product2000 is equal to ProductCategory.
I do not see Quantity2000, ProvinceImport2000, Product2000, or ProductCategory in the image. Where do those come from? Can you make a small data frame that shows how the data are laid out and post the code for making it, not an image of it. Something like

DF <- data.frame(Quantity2000 = c(4,2,6), ProvinceImport2000 = c(39130, 40000, 39130), Product2000 = c("A", "C", "F"))

Obviously, I made up that data with no knowledge of what your data actually look like.

Hello,
I'm sure you shared this image with the best intentions, but perhaps you didnt realise what it implies.
If someone wished to use example data to test code against, they would type it out from your screenshot...

This is very unlikely to happen, and so it reduces the likelihood you will receive the help you desire.
Therefore please see this guide on how to reprex data. Key to this is use of either datapasta, or dput() to share your data as code

Hi thank you for your response. Yes that is how I should be summing the Quantity2000. Please see the attached image. I cant upload 2 images last night. The output of I am hoping is a table which accounts the quantity (Column G) with corresponding product code (column E) which are only passing through the ports (column C).

.

Also, I usually see simple headers in youtube videos where there is only one row for header, is it a requirement for the console?

Thank you very much.

Thank you for the reminder. I will work out on the code.

Quantity2000<-c(50,70,2,4,6,9,35," "," ",11)
ProvinceCode <-c(1,23,45,23,45,60,60,23,50)
ProductCode<-c(1,2,2,3,4,6,7,3,03,02)
#Product codes containing 2, and 02, are the same code and be name as row 02
#the output should be another table containing

#Group1 Column containing rows ProductCode (2,3)
#Group2 Column containing rows ProductCode (4,5,6,7) ---This should be in row after Group1
#Then sum of Group1 and Group2

Thanks for explaining more about what you want to do. I am still not sure how you want to handle the combination of ProductCode and ProvinceCode but I think we can clarify that using your data. To do calculations like SUMIFS in a spreadsheet, I would use some functions from the dplyr package. Using the data you provided, I did some example calculations. I expect that none of the examples is exactly what you want but I hope they will help you understand a little about how this is done in R and allow you to help me understand better what you need to do.
Notice that I changed you Quantity2000 values of " " to NA. R is much stricter than Excel about mixing text and numbers. Also, data frames do not use multi-row column names. Data rows and column names are entirely different things, unlike in a spreadsheet.

library(dplyr())

DF <- data.frame(Quantity2000 = c(50,70,2,4,6,9,35,NA,NA,11),
                 ProvinceCode = c(1,23,45,23,45,60,60,23,50, 45),
                 ProductCode = c(1,2,2,3,4,6,7,3,03,02))
DF
#>    Quantity2000 ProvinceCode ProductCode
#> 1            50            1           1
#> 2            70           23           2
#> 3             2           45           2
#> 4             4           23           3
#> 5             6           45           4
#> 6             9           60           6
#> 7            35           60           7
#> 8            NA           23           3
#> 9            NA           50           3
#> 10           11           45           2

#Sum for each ProductCode
SumProd <- DF %>% group_by(ProductCode) %>% summarize(Total = sum(Quantity2000, na.rm = TRUE))
SumProd
#> # A tibble: 6 x 2
#>   ProductCode Total
#>         <dbl> <dbl>
#> 1           1    50
#> 2           2    83
#> 3           3     4
#> 4           4     6
#> 5           6     9
#> 6           7    35

#Sum for each combination of ProductCode and Province
SumProdProv <- DF %>% group_by(ProductCode, ProvinceCode) %>% 
  summarize(Total = sum(Quantity2000, na.rm = TRUE))
SumProdProv
#> # A tibble: 8 x 3
#> # Groups:   ProductCode [6]
#>   ProductCode ProvinceCode Total
#>         <dbl>        <dbl> <dbl>
#> 1           1            1    50
#> 2           2           23    70
#> 3           2           45    13
#> 4           3           23     4
#> 5           3           50     0
#> 6           4           45     6
#> 7           6           60     9
#> 8           7           60    35

#Group Product codes as (1,2,3) and (4,5,6,7)

DF <- DF %>% mutate(Group = case_when(
  ProductCode %in% c(1,2,3) ~ "A",
  ProductCode %in% c(4,5,6,7) ~ "B",
  TRUE ~ "Unknown"))
DF
#>    Quantity2000 ProvinceCode ProductCode Group
#> 1            50            1           1     A
#> 2            70           23           2     A
#> 3             2           45           2     A
#> 4             4           23           3     A
#> 5             6           45           4     B
#> 6             9           60           6     B
#> 7            35           60           7     B
#> 8            NA           23           3     A
#> 9            NA           50           3     A
#> 10           11           45           2     A

SumGroupProv <- DF %>% group_by(Group, ProvinceCode) %>% 
  summarize(Total = sum(Quantity2000, na.rm = TRUE))
SumGroupProv
#> # A tibble: 6 x 3
#> # Groups:   Group [2]
#>   Group ProvinceCode Total
#>   <chr>        <dbl> <dbl>
#> 1 A                1    50
#> 2 A               23    74
#> 3 A               45    13
#> 4 A               50     0
#> 5 B               45     6
#> 6 B               60    44

Created on 2021-01-07 by the reprex package (v0.2.1)

Wow. This is great. Thank you very much FJCC. If it is not to much to ask, my groupings are coming from another table and arrange according to subgroup as shown in the image. Column C is where the values are returning base on the product code in column B, how will I merge the two tables so I can generate a final table similar to the image.

Im really greatful

Am I correct in thinking that you have many product codes and you want to sum Quantity2000 for each code but each code is part of a larger group, such as Rice and Corn, and you want the totals also for the larger groups?
If you already have a data frame that has the product codes and the quantities, you can easily add the larger groups to that by making a data frame that has each product code and the larger group it belongs to. You can probably make this data frame with copy and paste in Excel and read it into R. I have to stop for the day because it is late here but below is an example with very simple data.
What is harder is arranging the Group totals to appear just below the sums for the individual product codes. Do you really need that arrangement?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DF <- data.frame(ProdCode = c(2,4,1,3,2,1,4,3,2,3), 
                 Quantity = c(66,44,55,77,88,1,33,99,23,73))
DF
#>    ProdCode Quantity
#> 1         2       66
#> 2         4       44
#> 3         1       55
#> 4         3       77
#> 5         2       88
#> 6         1        1
#> 7         4       33
#> 8         3       99
#> 9         2       23
#> 10        3       73


DF2 <- data.frame(ProdCode = c(1,2,3,4), Group = c("Rice", "Corn", "Rice", "Friut"))
DF2
#>   ProdCode Group
#> 1        1  Rice
#> 2        2  Corn
#> 3        3  Rice
#> 4        4 Friut

DF <- inner_join(DF, DF2, by = "ProdCode")
DF
#>    ProdCode Quantity Group
#> 1         2       66  Corn
#> 2         4       44 Friut
#> 3         1       55  Rice
#> 4         3       77  Rice
#> 5         2       88  Corn
#> 6         1        1  Rice
#> 7         4       33 Friut
#> 8         3       99  Rice
#> 9         2       23  Corn
#> 10        3       73  Rice

SumProd <- DF %>% group_by(ProdCode) %>% summarize(Total = sum(Quantity))
#> `summarise()` ungrouping output (override with `.groups` argument)
SumProd
#> # A tibble: 4 x 2
#>   ProdCode Total
#>      <dbl> <dbl>
#> 1        1    56
#> 2        2   177
#> 3        3   249
#> 4        4    77

SumGroup <- DF %>% group_by(Group) %>% summarize(GrpTotal = sum(Quantity))
#> `summarise()` ungrouping output (override with `.groups` argument)
SumGroup
#> # A tibble: 3 x 2
#>   Group GrpTotal
#>   <chr>    <dbl>
#> 1 Corn       177
#> 2 Friut       77
#> 3 Rice       305

Created on 2021-01-07 by the reprex package (v0.3.0)

I have created a Table showing ImportCode containing these values with its corresponding Quantity Code

ImportQuantity=subset(Trade2000,ImportCode==39130|ImportCode==39150|ImportCode==39170|ImportCode==39500|ImportCode==39700|ImportCode==39900)
#example
ImportCode ExportCode ProductCode Quantity ValuePeso
13 39900 10905 69969 12331 132947
14 39900 10905 33411 12321 135780
15 39150 8910 52261 301000 2558500
16 39900 8910 33440 490342341 4124302512

I am combining these table with another Table Material Category by Product code.
Material Category Subcategory ProductCode

1 Biomass Rice 4231
2 Biomass Rice 4210
3 Biomass Corn 4410
4 Biomass Corn 4490
5 Biomass Corn 4721
6 Biomass Corn 4811

I got this error
Error: Can't join on x$ProductCode x y$ProductCode because of incompatible types.
i x$ProductCode is of type >.
i y$ProductCode is of type >.

#when change I cahnge the ProductCode to as.character

Matcategory<-as.character(Material_category$ProductCode)
View(Material_category)
91 12510 2555 78513 65847 3505500
Error: unexpected numeric constant in "91 12510"

How can I make them uniform (this is also the case where the first character of ProductCode is 0)

You got it right, that is how I should present the table. I already change the table accordingly.

Thank you for your help. I was able to start with your guidance. Keep safe.

Hi FJCC! I just want to thank you for the help. I already figure out the errors and finally got my final result. Thank you very much.

Material.Category Subcategory Quantity

1 Biomass Abaca and other fabric products 20203460
2 Biomass Cigarettes and Tobacco 404374
3 Biomass Corn 2477397
4 Biomass Cotton , Silk , and other Fiber Products 1794460
5 Biomass Crop Residue 1133560
6 Biomass Fisheries (Commercial) 165769
7 Biomass Fruit Products 172071
8 Biomass Fuel wood 10394340
9 Biomass Juice and Alcoholic Beverages 5884421
10 Biomass Leather 340032

Hi FJCC! I am hoping for your help again. I get the right total for Quantity if only using the sum(ImportCode) however if I use a more specific grouping such as ProductCode, the total sum is not the same. I don't know how I can send the data for your reference.

Hi FJCC
library (dplyr)
library(readxl)
Domstat_Coastwise_2018 <- read_excel("C:/Users/Computer/Desktop/Research-PSA-Openstat/CFP2000-2019x/Domstat_Coastwise_2018.xlsx")
View(Domstat_Coastwise_2018)
DomTrade=Domstat_Coastwise_2018[-c(1:6),c(1,3,5,7,8)]
Col_Name=data.frame("ImportCode","ExportCode","ProductCode","Quantity","Value")
View(Col_Name)
names(DomTrade)<-Col_Name
library(readxl)
Domestic <- read_excel("C:/Users/Computer/Desktop/Domestic.xlsx")
View(Domestic)
library(readxl)
Domestic <- read_excel("C:/Users/Computer/Desktop/Domestic.xlsx",
sheet = "DomCode")
Trade<- inner_join(DomTrade, Domestic, by = "ProductCode")
Trade<-Trade[,c(6:9,3,4,5,1,2)]
Trade$Quantity<-as.numeric(Trade$Quantity)
Trade$Value<-as.numeric(Trade$Value)
Import2018=Trade[Trade$ImportCode %in% c("39130", "39150","39170", "39500","39700","39900"),]
Export2018=Trade[Trade$ExportCode %in% c("39130", "39150","39170", "39500","39700","39900"),]
ImportQuantity2018<-Import2018%>%group_by(MaterialCategory,ProductStage,Subcategory)%>%summarise(Quantity=sum(Quantity,na.rm=TRUE), Value=sum(Value,na.rm=TRUE))
ExportQuantity2018<-Export2018%>%group_by(MaterialCategory,ProductStage,Subcategory)%>%summarise(Quantity=sum(Quantity,na.rm=TRUE), Value=sum(Value,na.rm=TRUE))
write.table(ImportQuantity2018,file="2018DomesticImport.csv", row.names = F, sep=",")
write.table(ExportQuantity2018,file="2018DomesticExport.csv", row.names = F, sep=",")

Try posting the output of

dput(head(ImportQuantity, 20))

When you post, please write a line containing only three back ticks, ```, just before and just after the copied output from dput()
```
Paste output here
```

structure(list(MaterialCategory = c("Biomass", "Biomass", "Biomass", 
"Biomass", "Biomass", "Biomass", "Biomass", "Biomass", "Biomass", 
"Fossil fuels", "Fossil fuels", "Fossil fuels", "Fossil fuels", 
"Fossil fuels", "Fossil fuels", "Metallic minerals", "Metallic minerals", 
"Metallic minerals", "Metallic minerals", "Metallic minerals"
), ProductStage = c("Finished products", "Finished products", 
"Finished products", "Raw materials", "Raw materials", "Raw materials", 
"Semi-manufactured products", "Semi-manufactured products", "Semi-manufactured products", 
"Raw materials", "Raw materials", "Raw materials", "Semi-manufactured products", 
"Semi-manufactured products", "Semi-manufactured products", "Finished products", 
"Finished products", "Finished products", "Finished products", 
"Finished products"), Subcategory = c("Agricultural", "Forestry", 
"Other biomass products", "Agricultural", "Fisheries", "Forestry", 
"Agricultural", "Forestry", "Other biomass products", "Bituminous crude materials", 
"Crude oil", "Natural gas", "Hydrocarbons", "Mineral oils", "Mineral waxes", 
"Electrical machines etc", "Machineries, tools, and apparatus", 
"Other non-precious metals and cermets finished goods", "Other product of base metals", 
"Vehicles automobiles"), Quantity = c(204427548, 265879797, 125853727, 
43748319, 30745, 13915185, 280444539, 3882353, 1020700, 5570930, 
328096, 820326, 9000, 21068094, 514837, 56432, 694416, 99972, 
246136, 175562321), Value = c(11671528491, 13306115837, 4884540047, 
14496461223, 2e+06, 672959180, 9805453636, 146841361, 42284927, 
301399696, 14304310, 40728200, 5e+05, 1024058906, 27747016, 9219961, 
48200970, 4359392, 14457496, 43335477558)), row.names = c(NA, 
-20L), groups = structure(list(MaterialCategory = c("Biomass", 
"Biomass", "Biomass", "Fossil fuels", "Fossil fuels", "Metallic minerals"
), ProductStage = c("Finished products", "Raw materials", "Semi-manufactured products", 
"Raw materials", "Semi-manufactured products", "Finished products"
), .rows = structure(list(1:3, 4:6, 7:9, 10:12, 13:15, 16:20), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 6L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))```

I named the data you posted Import2018 and ran the following code that shows that the sum of Quantity and Value are same same in the initial table and in the summarized results. What do you get if you run the same code on the full data set?

ImportQuantity2018 <- Import2018 %>% group_by(MaterialCategory,ProductStage,Subcategory) %>%
   summarise(Quantity  = sum(Quantity,na.rm=TRUE), Value = sum(Value,na.rm=TRUE))

 
sum(Import2018$Quantity) == sum(ImportQuantity2018$Quantity)
[1] TRUE
sum(Import2018$Value) == sum(ImportQuantity2018$Value)
[1] TRUE

I need to reorganize my row names as they are case sensitive.

this is from R Studio

ImportQuantity<-Import2018%>%group_by(MaterialCategory)%>%summarise(Quantity=sum(Quantity,na.rm=TRUE), Value=sum(Value,na.rm=TRUE))
summarise() ungrouping output (override with .groups argument)
View(ImportQuantity)
ImportQuantity

A tibble: 8 x 3

MaterialCategory Quantity Value

1 Biomass 939202913 55028184702
2 Fossil fuels 28311283 1408738128
3 Metallic minerals 176839986 43423783027
4 Metallic Minerals 951694460 62457237987
5 Non-metallic minerals 42437889 1375800475
6 Non-metallic Minerals 266313010 10587103260
7 Other products 1219978510 74605380109
8 Other Products 1610264 94521200

sum(ImportQuantity$Quantity)
[1] 3626388315
ExportQuantity<-Import2018%>%group_by(MaterialCategory)%>%summarise(Quantity=sum(Quantity,na.rm=TRUE), Value=sum(Value,na.rm=TRUE))
summarise() ungrouping output (override with .groups argument)
ExportQuantity

A tibble: 8 x 3

MaterialCategory Quantity Value

1 Biomass 939202913 55028184702
2 Fossil fuels 28311283 1408738128
3 Metallic minerals 176839986 43423783027
4 Metallic Minerals 951694460 62457237987
5 Non-metallic minerals 42437889 1375800475
6 Non-metallic Minerals 266313010 10587103260
7 Other products 1219978510 74605380109
8 Other Products 1610264 94521200

sum(ExportQuantity$Quantity)
[1] 3626388315

#This is from excel

library(readxl)
Domestic_Trade_data <- read_excel("C:/Users/Computer/Desktop/Domestic Trade data.xlsx",

  • sheet = "Sheet2")
    

New names:

  • `` -> ...1
  • Raw materials -> Raw materials...2
  • Semi-manufactured products -> Semi-manufactured products...3
  • Finished products -> Finished products...4
  • TOTAL -> TOTAL...5
  • ...

View(Domestic_Trade_data)
Domestic_Trade_data

A tibble: 6 x 11

...1 Raw materials.~ Semi-manufactu~ `Finished produ~ TOTAL...5 ...6

1 Biom~ 813224949 471461969 318238308 1.60e9 NA
2 Foss~ 50588933 2017113096 0 2.07e9 NA
3 Meta~ 20650 24807701 158918893 1.84e8 NA
4 Non-~ 5439618 14557829 18894569 3.89e7 NA
5 Othe~ 0 148620 395126023 3.95e8 NA
6 Waste 0 NA NA 0. NA

... with 5 more variables: ...7 , Raw materials...8 ,

Semi-manufactured products...9 , Finished products...10 ,

TOTAL...11

#total import quantity

sum(Domestic_Trade_data$TOTAL...5)
[1] 4288541158
#total export quantity
sum(Domestic_Trade_data$TOTAL...11)
[1] 4882677736

#I use the same set of product codes

Thanks for your help.