Help needed! Aggregated counts in R

#1

Hi All,

I've just set out using R Studio and wondered if someone could help me create a column that counts the eligible responses to a survey we have sent out at the NHS.

I have three columns:

Council code
Question Response (1-4 or 'NA')
Total Eligible Responses

I need the 'Total Eligible Responses' column to be the number of valid Question Responses per council. So a count of 'Question Response' (where not NA) grouped by 'Council Code'.

For example if in council A there were 500 valid responses, I would like 500 to be repeated for the whole column alongside council A

I've tried attaching an example but it won't let me!

Cheers for any help!

0 Likes

#2

Welcome!

I tried to take your descriptionof the data and created a sample input dataset. Note the below is just a screenshot. The code to generate that is below and is assigned to the "dataset" variable.

To do this I used datapasta to paste from Sheets to R. I recommend this fabulous post on how to generate a reprex https://maraaverick.rbind.io/2018/10/reprex-with-datapasta/.

#read in sample data from Sheets using datapasta
library(tidyverse)
dataset <- data.frame(stringsAsFactors=FALSE,
   CouncilCode = c("x", "x", "x", "x", "x", "y", "y", "y", "y", "z", "z", "z",
                   "z", "z", "z"),
      Response = c(1L, 1L, NA, NA , 3L, 4L, 4L, 3L, 1L, 1L, 3L, 3L, 2L, 2L, NA)
)

image

On to the actual computation. What I ended up doing here is grouping by CouncilCode and summing across Response where Response is not NA.

What I end up with is a column TotalEligibleResponses that is equal to the sum of responses that are not NA.

#computation
dataset %>% group_by(CouncilCode) %>%
  mutate(TotalEligibleResponses = sum(!is.na(Response)))

Check out the TotalEligibleResponses column.

0 Likes

#3

Hey thanks for your help Kathleen! Been really struggling with this.

In your example the TotalEligibleResponses column = 12 for all councils, would it be possible to code it so the column returned 3 for 'X', 4 for 'Y' and 5 for 'Z' ? i.e. separate count per council

0 Likes

#4

This is the output in the console I'm seeing. Are you running the suggested code snippet? Could you please share what you are running to get "12" in TotalEligibleResponses?

dataset %>% group_by(CouncilCode) %>%
+   mutate(TotalEligibleResponses = sum(!is.na(Response)))
Source: local data frame [15 x 3]
Groups: CouncilCode [3]

   CouncilCode Response TotalEligibleResponses
         <chr>    <int>                  <int>
1            x        1                      3
2            x        1                      3
3            x       NA                      3
4            x       NA                      3
5            x        3                      3
6            y        4                      4
7            y        4                      4
8            y        3                      4
9            y        1                      4
10           z        1                      5
11           z        3                      5
12           z        3                      5
13           z        2                      5
14           z        2                      5
15           z       NA                      5
>

The code I am running is:

library(tidyverse)
dataset <- data.frame(stringsAsFactors=FALSE,
                      CouncilCode = c("x", "x", "x", "x", "x", "y", "y", "y", "y", "z", "z", "z",
                                      "z", "z", "z"),
                      Response = c(1L, 1L, NA, NA , 3L, 4L, 4L, 3L, 1L, 1L, 3L, 3L, 2L, 2L, NA)
)

dataset %>% group_by(CouncilCode) %>%
  mutate(TotalEligibleResponses = sum(!is.na(Response)))
0 Likes

#5

Hi Kathleen, how odd! The code i'm running is identical I think as I've copied and pasted straight out of your reply:

library(tidyverse)
dataset <- data.frame(stringsAsFactors=FALSE,
CouncilCode = c("x", "x", "x", "x", "x", "y", "y", "y", "y", "z", "z", "z",
"z", "z", "z"),
Response = c(1L, 1L, NA, NA , 3L, 4L, 4L, 3L, 1L, 1L, 3L, 3L, 2L, 2L, NA)
)

dataset %>% group_by(CouncilCode) %>%
mutate(TotalEligibleResponses = sum(!is.na(Response)))

but it is returning this:
CouncilCode Response TotalEligibleResponses

1 x 1 12
2 x 1 12
3 x NA 12
4 x NA 12
5 x 3 12
6 y 4 12
7 y 4 12
8 y 3 12
9 y 1 12
10 z 1 12
11 z 3 12
12 z 3 12
13 z 2 12
14 z 2 12
15 z NA 12

0 Likes

#6

Hmmmm...I'm super perplexed. I've tried it on another machine and the results continue to be different integers. I'm wondering if it's some sort of package issue.

Try changing library(tidyverse) to library(dplyr). I wonder what would happen in that case on your end?

Not sure why this would make a difference though... perhaps someone with a deeper understanding of tidyverse packages and functions might weigh in.

0 Likes

#7

Hi Kathleen yes it's still returning the same results for me with dplyr unfortunately! Really annoying as it would be the perfect solution for me if my outputs only matched yours!
Thank you for all your help on this

0 Likes

#8

Could you please include a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

0 Likes

#9

Morning Mara,

I am not able to access my virtual machine today (at the NHS we require VMs in order to be able to download packages) but when I'm back in the office tomorrow I will do this and create a specific example using the reprex package.
Thank you for your ongoing help.
James

0 Likes

closed #10

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes