Calculate Response Efficiency

Tams101 · May 19, 2020, 1:50pm

I have the dataset of a distribution company. The actual result that I am looking for is to get the response efficiency when I divide Response over the Billed Population. The challenge here is that the data set there's no billed population as a variable, so I'll have to convert the individual billed energy to represent the billed population and then divide by the Response. How do I go about it?

Response <- df %>%
 mutate(Char = nchar(Payment.Efficiency....))%>%
 filter(!Char == "NotBilled")%>%
 group_by(Zone) %>%
 summarise(BilledPop = count(Char >0),
        Response = count(Resp >0),
        Efficiency = Response/Char)
#> Error in df %>% mutate(Char = nchar(Payment.Efficiency....)) %>% filter(!Char == : could not find function "%>%"
 Response
#> Error in eval(expr, envir, enclos): object 'Response' not found

^{Created on 2020-05-19 by the reprex package (v0.3.0)}

andresrcs · May 19, 2020, 5:14pm

Can you turn this into a proper reproducible example by including sample data on a copy/paste friendly format?

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Tams101 · May 20, 2020, 12:34pm

This is a sample dataframe I created. I don't really know how to randomize the names without replacement, that would have been much better. But, imagine with this data, the billed energy are in KwH and the Response are numbers. Instead of dividing the KwH by the Response, I would like to count the Kwh per zone and then divide by the count of response per zone.

activity <-data.frame(Num = c("Jane","Bill","Fred","Tina","Joe"),Zone=c("AKS","BYS","CRS",
                              "PHC-1","PHC-2"),
                       BilledEnergy=c(2300,4003,1121,3022,1713),Response = c(1,1,1,1,1))
 bind_rows(replicate(3, activity, simplify = F))
#> Error in bind_rows(replicate(3, activity, simplify = F)): could not find function "bind_rows"

^{Created on 2020-05-20 by the reprex package (v0.3.0)}

nirgrahamuk · May 20, 2020, 1:01pm

library(tidyverse)

activity <-data.frame(Num = c("Jane","Bill","Fred","Tina","Joe"),Zone=c("AKS","BYS","CRS",
                                                                        "PHC-1","PHC-2"),
                      BilledEnergy=c(2300,4003,1121,3022,1713),Response = c(1,1,1,1,1))
(activity2 <- bind_rows(replicate(3, activity, simplify = F)))

(activity2 %>% group_by(Zone) %>% summarise(
   sum_billed_energy = sum(BilledEnergy),
  sum_of_response = sum(Response),
  div = sum_billed_energy/sum_of_response
))

?

Tams101 · May 20, 2020, 1:39pm

Thanks. But it doesn't solve the problem... The units of measurements are different. My end result is to get the percentage of response over the billed population. Since there is not variable named billed population, I want to count (Convert) the number of billed energy (Just like Excel Pivot Table) to get the response efficiency.

nirgrahamuk · May 20, 2020, 1:44pm

sorry, but you are failing to explain yourself in non-ambigious terms.
I understand the concept of billed energy measured in KWh,
but what is response, what does it measure, in what unit?

Given that we have succesfully summed the billed energy per zone ( let me know if we havent) , lets focus on what response is and how it should be treated.

Tams101 · May 20, 2020, 1:53pm

Sorry for my ambiguity. Response here means the number of individual customers that responded (paid their bills) in the billing month.

Just like working with Excel pivot, I just want to get the count of billed energy - that can suffice for the number of customers billed for the month or as the case may be.

nirgrahamuk · May 20, 2020, 1:59pm

I think I made that assumption.
For each Zone, we have summed billed energy,
For each Zone we have summed the responses (no of customers billed for the month for the zone).
Then we can divide the sum of billed energy by the sum of responders in zone to get the avarage bill of a typical responder in the zone.
I dont know what 'count' of billed energy means...
if it means sum of the energy , we summed that.
if you want to know how many bills were responded to, we summed that.
if you want to know something other than these, please try again to explain what that might be.

I apologise for not having understood your requirements, and appreciate your further efforts to explain.
Perhaps you can pick a zone, and write a 3 row table, and how you would like to see it processed. That would I hope be a clear signal to what you intend ?

Tams101 · May 20, 2020, 3:26pm

I think the missing point is that we are summing instead of counting the number of rows. If we can get the count of billed energy per row likewise the count of response per row that solves it.
So in our dataset, it should be 5/5 = 100%
where count of billed energy = 5; and count of Response = 5 ... This is what we want to achieve.

Each billed energy represents each customer arranged in rows. By summing the billed energy and Response we are doing (5/12,159) where Response = 5 and billed energy = 12,159 which is not what we want.

nirgrahamuk · May 20, 2020, 3:41pm

counting rows is trivial.., its the n() function that can be passed to summarise.
its just not clear to me what you want to count.
rows per zone ? in the data you gave me there are a total of 15 rows, they can be grouped into 5 Zones each with 3 observations.

12159, is the sum of the activity frame (not the activity 2 frame that binds the activity frame 3 times )

should I ignore activity2 , and work with only the 5 row activity?

(activity %>% summarise(
  sum_billed_energy = sum(BilledEnergy),
  sum_of_response = sum(Response),
  number_of_observations = n(),
  div = sum_billed_energy/sum_of_response
))

 sum_billed_energy sum_of_response number_of_observations   div
              <dbl>           <dbl>                  <int> <dbl>
1             12159               5                      5 2432.

system · June 10, 2020, 3:41pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.