Tranform and summarize data

sten · July 24, 2019, 4:02pm

So I have a dataframe which I would like to transform and summarize it.
It currently looks like this

SAMPLE DATA

sample_date<-data.frame(stringsAsFactors=FALSE,
           Date = c("2019-06-04", "2019-06-05", "2019-06-06",
                    "2019-06-07", "2019-06-08", "2019-06-09"),
           apple_cost = c(6685.5601, 17387.3072, 31587.2694, 7489.1275,
                         8490.1844, 9372.4676),
           apple_count = c(601762, 557952, 1003681, 243348, 273511, 303130),
           banana_cost = c(6685, 6685, 6685, 6685, 6685, 6685),
           banana_count = c(557952, 557952, 557952, 557952, 557952, 557952),
           orange_cost = c(6685, 6685, 6685, 6685, 6685, 6685),
           orange_count = c(1003681, 1003681, 1003681, 1003681, 1003681, 1003681)
)

        Date apple_cost apple_count banana_cost banana_count orange_cost orange_count
1 2019-06-04   6685.560      601762        6685       557952        6685      1003681
2 2019-06-05  17387.307      557952        6685       557952        6685      1003681
3 2019-06-06  31587.269     1003681        6685       557952        6685      1003681
4 2019-06-07   7489.127      243348        6685       557952        6685      1003681
5 2019-06-08   8490.184      273511        6685       557952        6685      1003681
6 2019-06-09   9372.468      303130        6685       557952        6685      1003681

I would like to transform and summarize it to :

EXPECTED OUTPUT

                         Type = c("apple","banana","orange"),
                         cost = c(243348,343348,443348),
                         count = c(3003681,4003681,5003681))

Type   cost   count
1  apple 243348 3003681
2 banana 343348 4003681
3 orange 443348 5003681

I did try to summarize it with the following code but it doesnt seem like the above expected output. This is what I have tried so far

WHAT I TRIED SO FAR

current_table <- sample_date %>% 
  summarise( apple_cost = sum( apple_cost, na.rm=TRUE),
             apple_count = sum( apple_count, na.rm=TRUE),
             banana_cost = sum( banana_cost, na.rm=TRUE),
             banana_count = sum(banana_count, na.rm=TRUE),
             orange_cost = sum(orange_cost, na.rm=TRUE),
             orange_count = sum(orange_count, na.rm=TRUE))

apple_cost apple_count banana_cost banana_count orange_cost orange_count
1   81011.92     2983384       40110      3347712       40110      6022086

FJCC · July 24, 2019, 4:14pm

If I understand you correctly, the kinds of rearrangement you want to do can be done with functions from the tidyr package.

library(dplyr)

library(tidyr)
sample_date<-data.frame(stringsAsFactors=FALSE,
                        Date = c("2019-06-04", "2019-06-05", "2019-06-06",
                                 "2019-06-07", "2019-06-08", "2019-06-09"),
                        apple_cost = c(6685.5601, 17387.3072, 31587.2694, 7489.1275,
                                       8490.1844, 9372.4676),
                        apple_count = c(601762, 557952, 1003681, 243348, 273511, 303130),
                        banana_cost = c(6685, 6685, 6685, 6685, 6685, 6685),
                        banana_count = c(557952, 557952, 557952, 557952, 557952, 557952),
                        orange_cost = c(6685, 6685, 6685, 6685, 6685, 6685),
                        orange_count = c(1003681, 1003681, 1003681, 1003681, 1003681, 1003681)
)
current_table <- sample_date %>% 
  summarise( apple_cost = sum( apple_cost, na.rm=TRUE),
             apple_count = sum( apple_count, na.rm=TRUE),
             banana_cost = sum( banana_cost, na.rm=TRUE),
             banana_count = sum(banana_count, na.rm=TRUE),
             orange_cost = sum(orange_cost, na.rm=TRUE),
             orange_count = sum(orange_count, na.rm=TRUE))
curr_table_gather <- current_table %>% gather(key = Label, value = Value)
curr_table_gather
#>          Label      Value
#> 1   apple_cost   81011.92
#> 2  apple_count 2983384.00
#> 3  banana_cost   40110.00
#> 4 banana_count 3347712.00
#> 5  orange_cost   40110.00
#> 6 orange_count 6022086.00
curr_table_gather <- curr_table_gather %>% separate(col = Label, into = c("Type", "Variable"), sep = "_")
curr_table_gather
#>     Type Variable      Value
#> 1  apple     cost   81011.92
#> 2  apple    count 2983384.00
#> 3 banana     cost   40110.00
#> 4 banana    count 3347712.00
#> 5 orange     cost   40110.00
#> 6 orange    count 6022086.00
FinalTable <- curr_table_gather %>% spread(key = Variable, value = Value)
FinalTable
#>     Type     cost   count
#> 1  apple 81011.92 2983384
#> 2 banana 40110.00 3347712
#> 3 orange 40110.00 6022086

^{Created on 2019-07-24 by the reprex package (v0.2.1)}

Matthias · July 24, 2019, 5:40pm

A bit shorter without defining the rows first, so it can work with any more columns.

resorted = sample_date %>%
  # convert to long
  gather(key = Entity, value = Number, -Date) %>% 
  # summarise 
  group_by(Entity) %>% summarise(Sum = sum(Number)) %>%
  # separate cost and count and spread to wide again 
  separate(Entity, into = c("Type", "Readout")) %>%
 spread(key = Readout, value = Sum)

Type cost count
1 apple 81012. 2983384
2 banana 40110 3347712
3 orange 40110 6022086

system · July 31, 2019, 5:40pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.