How to add row of total and percentage in list of data table in R?

Shadan · December 14, 2020, 4:49am

I just added a sample here. I just want to add the fourth row of totals to sum the columns y and z, not PROVINCE.



#> [[1]]
#>   PROVINCE         Y        Z
#> 1        2  89.98376 90.79929
#> 2        2  97.12802 90.68774
#> 3        4 100.62682 91.20288
#> 
#> [[2]]
#>   PROVINCE         Y         Z
#> 1        4 106.09127  94.19237
#> 2        6 111.30426 107.42695
#> 3        5  84.44623 102.04221

gueyenono · December 14, 2020, 7:54am

Hi @Shadan

You can use the adorn_totals() function, which is in the janitor package. Here, I provide two solutions: one with base R and one with the purrr package. Let me know if you have more questions:

# Load the janitor package
library(janitor)

# Create sample data
set.seed(1)
d <- lapply(1:2, function(x) data.frame(province = sample(6, size = 3), Y = rnorm(3)*100, Z = rnorm(3)* 100))

# Solution using base R
lapply(d, function(x) adorn_totals(dat = x, where = "row"))

[[1]]
 province          Y         Z
        6 -137.70596  -5.93134
        2  -41.49946 110.00254
        1  -39.42900  76.31757
    Total -218.63441 180.38877

[[2]]
 province          Y          Z
        6  -83.20433 -156.37821
        3 -116.65705  115.65370
        1 -106.55906   83.20471
    Total -306.42044   42.48021

# Solution using the tidyverse
library(purrr)
map(d, ~ adorn_totals(.x, where = "row"))

[[1]]
 province          Y         Z
        6 -137.70596  -5.93134
        2  -41.49946 110.00254
        1  -39.42900  76.31757
    Total -218.63441 180.38877

[[2]]
 province          Y          Z
        6  -83.20433 -156.37821
        3 -116.65705  115.65370
        1 -106.55906   83.20471
    Total -306.42044   42.48021

Shadan · December 14, 2020, 8:03am

@gueyenono Thank you for answering.

Shadan · December 14, 2020, 8:59am

@gueyenono If I have PROVINCE and DISTRICT like bellow. Which column DISTRICT should calculate.

#> [[1]]
#>   PROVINCE  DISTRICT        Y        Z
#> 1        2         2     89.98376 90.79929
#> 2        2        7      97.12802 90.68774
#> 3        4       10    100.62682 91.20288
#> 
#> [[2]]
#>   PROVINCE DISTRICT         Y         Z
#> 1        4         2     106.09127  94.19237
#> 2        6         3    111.30426 107.42695
#> 3        5         4    84.44623 102.04221

gueyenono · December 14, 2020, 9:15am

Actually, even though I answered the question with adorn_totals(), I would say that adding a row for total and for percentage in your data is not the best way to do things. I understand that you may want to do it to present your results, in which case I recommend the gt package for creating tables with summary rows. The following article will teach you how to do it: https://gt.rstudio.com/articles/creating-summary-lines.html However, you should not add total and percentage rows to your data.

Also, I have a question about the percentage value that you are expecting. Could you please tell me what percentage values you are expecting given your examples above?

Shadan · December 14, 2020, 9:43am

Thanks for sharing the link. The percentage must be like:

			
province	y	z	
2	0	8.670000076	
3	0	0	
6	0	19.92000008	
23	30.90999985	2346.489988	
Total	30.90999985	2375.079988	
percentage 	0.012847102	0.987152898

Shadan · December 14, 2020, 9:45am

First must calculate the Total row then Totalcolumny/totalof rowtotal.

gueyenono · December 14, 2020, 1:34pm

@Shadan

I would like to repeat again that this is not really a good practice; however, if you really need to do that, here is a code that I wrote for you:

# Load the janitor package
library(janitor)

# Create sample data
set.seed(1)
d <- lapply(1:2, function(x) data.frame(province = sample(6, size = 3), Y = runif(3)*100, Z = runif(3)* 100))
d
[[1]]
  province        Y        Z
1        1 90.82078 94.46753
2        4 20.16819 66.07978
3        3 89.83897 62.91140

[[2]]
  province        Y        Z
1        2 68.70228 49.76992
2        3 38.41037 71.76185
3        5 76.98414 99.19061

# Solution using base R
lapply(d, function(x){
  
  last_row <- nrow(x)
  total <- sapply(x[, c("Y", "Z")], sum)
  pct <- total / sum(total)
  
  x[last_row+1, ] <- c("Total", total)
  x[last_row+2, ] <- c("Percentage", pct)
  
  x

[[1]]
    province                 Y                 Z
1          1  90.8207789994776  94.4675268605351
2          4  20.1681931037456  66.0797792486846
3          3  89.8389684967697   62.911404389888
4      Total  200.827940599993  223.458710499108
5 Percentage 0.473330801428126 0.526669198571874

[[2]]
    province                 Y                 Z
1          2  68.7022846657783  49.7699242085218
2          3  38.4103718213737  71.7618508264422
3          5  76.9841419998556  99.1906094830483
4      Total  184.096798487008  220.722384518012
5 Percentage 0.454763030547208 0.545236969452792
 
})

Shadan · December 15, 2020, 2:16am

@gueyenono, Thank you for writing the code which I am looking for. Just want to know if I have a different number of columns in the list. I do not know you change the below code to apply for all columns in each data frame in the list.
Here we have two columns( Y, Z) in both data frames in the list. My list data frame has different numbers of columns. I just added a screenshot of my list

 total <- sapply(x[, c("Y", "Z")], sum)

gueyenono · December 15, 2020, 2:18am

@Shadan Your picture is not showing.

Shadan · December 15, 2020, 2:24am

I upload again:

gueyenono · December 15, 2020, 2:36am

Is the first column "province" in each data frame of the list?

Shadan · December 15, 2020, 2:44am

Yes, "province" is the first column for all. Your code is great. It is the code that I am looked for, just I do not know to change this line of code: sapply(x[, c("Y", "Z")], sum). As you wrote c("Y", "Z") my list has a different number of columns. Instead of c("Y", "Z") should be something to represent all columns. Thank you.

gueyenono · December 15, 2020, 2:47am

This should work then:

lapply(d, function(x){
  
  last_row <- nrow(x)
  total <- sapply(x[, -1)], sum)
  pct <- total / sum(total)
  
  x[last_row+1, ] <- c("Total", total)
  x[last_row+2, ] <- c("Percentage", pct)
  
  x

})

Shadan · December 15, 2020, 3:05am

@gueyenono there is "province, district, sub_district, and village" should not get the total and percentage.
The data frame look like this :
I used the following code :

lapply(d, function(x){
  
  last_row <- nrow(x)
  total <- sapply(x[, -4)], sum)
  pct <- total / sum(total)
  
  x[last_row+1, ] <- c("Total", total)
  x[last_row+2, ] <- c("Percentage", pct)
  
  x

})
 Instead "-1 " -4 but did not work.

gueyenono · December 15, 2020, 10:15am

I guess the closest solution would be:

lapply(d, function(x){
  
  last_row <- nrow(x)
  total <- sapply(x[, -(1:4)], sum)
  pct <- total / sum(total)
  
  x[last_row+1, ] <- c(rep("Total", 4), total)
  x[last_row+2, ] <- c(rep("Percentage", 4), pct)
  
  x

})

Once again, you should really look into the gt package to make nice summary tables instead of adding your summaries right into the data itself.

Shadan · December 15, 2020, 10:50am

gueyenono:

lapply(d, function(x){
  
  last_row <- nrow(x)
  total <- sapply(x[, -(1:4)], sum)
  pct <- total / sum(total)
  
  x[last_row+1, ] <- c(rep("Total", 4), total)
  x[last_row+2, ] <- c(rep("Percentage", 4), pct)
  
  x

})

Thank you, sir. It has done!

Shadan · December 17, 2020, 3:53am

Thank sir for all your time and solving my problems.
Is it possible to move the percentage row to top ( after column names) and total row to be in the second row of each data frame in the list?

system · December 24, 2020, 6:04am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.