Struggling to use tidyverse methods with 'circular' package

mrblobby · September 26, 2018, 2:01am

I have a large data frame that I'd like to split/group by three variables and then use the 'circular' package to compute, for now, the mean phase of some degree values. The depth of my knowledge using dplyr and so on is limited to grouping, mutating and summarising. Unfortunately, that is insufficient for my current problem.

My data looks something like this:

data <- data.frame(
      groupA = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "A",
                 "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "A", "A",
                 "A", "A", "A", "B", "B", "B", "B", "B", "B"),
      groupB = c("L", "M", "N", "L", "M", "L", "M", "N", "L", "M", "N", "L",
                 "M", "N", "L", "M", "L", "M", "N", "L", "M", "N", "L", "M",
                 "N", "L", "M", "L", "M", "N", "L", "M", "N"),
      groupC = c("X", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "Y", "X", "X",
                 "X", "X", "Y", "Y", "X", "X", "Y", "Y", "Y", "X", "X", "X",
                 "X", "Y", "Y", "X", "X", "Y", "Y", "Y", "X"),
     degrees = c(10, 111.5, 360, 90, 120, 180, 50, 60, 70, 80, 90, 10, 100,
                 360, 90, 120, 180, 50, 60, 70, 80, 90, 10, 100, 360, 90, 120,
                 180, 50, 60, 70, 80, 90)
)

What I would like to do is group the data by all three groups. Normally, I'd do this:


 data %>%
  group_by(groupA, groupB, groupC) %>%
  ....

Which works, but then I need to combine this with the circular package. I've only just begun to use this package so could be wrong, but my understanding is that I first need to convert the data type to circular, and then I can compute the circular mean:

library(circular)
x <- c(350, 90)
test_circular <- circular(x, units = "degrees", rotation = "clock")
mean.circular(test_circular)

Circular Data: 
Type = angles 
Units = degrees 
Template = none 
Modulo = asis 
Zero = 0 
Rotation = clock 
[1] 40

That throws out answers that make sense, but I can't seem to combine the above with dplyr methods. I've tried basically guessing the syntax involved but nothing seems to work, e.g.:

data %>%
  group_by(groupA, groupB, groupC) %>%
  do(circular(.$degrees))
Error: Results 1, 2, 3, 4, 5, ... must be data frames, not circular/numeric

This is the closest I've come, but I lose all other columns, so matching it to my groupings is difficult...

data %>%
  group_by(groupA, groupB, groupC) %>%
  do(x = as.data.frame(.$degrees)) %>%
  do(circ_data = circular(.$x, 
     units = "degrees",
     zero = 0,
     rotation = "clock")) %>%
  summarise(circ_mean = mean(circ_data))

rensa · September 26, 2018, 3:19am

Test! (Wasn't sure if I clicked the wrong button when replying to this... okay, looks like I did )

Hey @mrblobby! Is this what you're looking for?

data %>%
  group_by(groupA, groupB, groupC) %>%
  summarise(
    circ_mean =
      degrees %>%
      circular(units = 'degrees', rotation = 'clock') %>%
      mean.circular()) %>%
  ungroup()
#> # A tibble: 11 x 4
#>    groupA groupB groupC circ_mean     
#>    <chr>  <chr>  <chr>  <S3: circular>
#>  1 A      L      X       1.000000e+01 
#>  2 A      L      Y       9.000000e+01 
#>  3 A      M      X       1.038276e+02 
#>  4 A      M      Y       1.200000e+02 
#>  5 A      N      X      -1.403296e-14 
#>  6 B      L      X       1.800000e+02 
#>  7 B      L      Y       7.000000e+01 
#>  8 B      M      X       5.000000e+01 
#>  9 B      M      Y       8.000000e+01 
#> 10 B      N      X       9.000000e+01 
#> 11 B      N      Y       6.000000e+01

(The nested pipes here might possibly throw you for a loop! The ones inside summarise() are happening separately to the ones outside summarise()).

mrblobby · September 26, 2018, 3:34am

Hi @rensa, this is exactly what I'm after! I was completely unaware you could even nest pipes like that inside summarise (and presumably mutate).

This is a bit of a broad question but can you help me understand why I don't have to pass any arguments to mean.circular() or how circular 'knows' to use degrees? I think something I get most confused about is when I need to use ., $x... and so on. Is there good documentation on this aspect?

rensa · September 26, 2018, 3:44am

It's definitely a confusing topic (especially, in my experience, if you start using purrr and working with list-columns, which bumps the complexity up a lot)! The piping operator %>% comes from magrittr, and that package's documentation is immensely helpful in understanding how the pipe works.

The basic version is that %>% is an operator, which in R is really just a function written out differently. Just like you can type 2 + 2 to return 4, you can also type:

`+`(2, 2)
# [1] 4

The piping operator is also a function, and it takes its left hand side and pops it into the right hand side function as the first argument. The trick, as you pointed out, is to understand when it does and when it doesn't, because there are some special rules with regard to that that can be easy to forget and lead to frustrating debugging issues. I recommend reading this section and this section very carefully to understand it.

The basic rule to take away from those sections is, if you're not sure whether the pipe has chosen to insert the LHS as the first argument, wrap the pipe step in { braces } to explicitly stop it from happening, and then you can decide whether to pop it in at the start yourself like %>% function(., other_args).

mrblobby · September 26, 2018, 3:59am

Brilliant, thanks for the info and your help @rensa!