Hi all
I am trying to get a function I have written for a single dataset to work inside a group_by() mutate() call, but I currently get an error with regards to the number of rows in the group.
I have made a fully reproducible reprex (first one ever, is awesome) regarding my issue. The function works fine on a single dataset and within a group_by() and a do() but not within group_by() mutate(). Any tips or ideas why are much appreciated!
# I am trying to calculate the cumulative distance between latitude and longitude points within a group_by call. For example I have latitude and longitude points for multiple runs I have done and would like to get the cumulative distance for each one.
# load packages
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
# custom function for getting distances between latitude and longitude points
# REQUIRES GEOSPHERE
get_dists <- function(dat_in, lon = 'lon', lat = 'lat'){
dat <- dat_in[,c(lon, lat)]
names(dat) <- c('lon', 'lat')
out <- sapply(2:nrow(dat), function(y){geosphere::distm(dat[y-1,], dat[y,])/1000})
out <- c(0, cumsum(out))
return(out)
}
# create fake data
d <- data.frame(run = c(1, 1, 1, 2, 2, 2),
lat = c(57.15508, 57.15521, 57.15520, 52.41278, 52.41283, 52.41317),
lon = c(-2.07886, -2.07886, -2.07887, -4.07803, -4.07806, -4.07858))
# calculate distance between all points irrespective of run
get_dists(d)
#> [1] 0.00000000 0.01447153 0.01573792 543.25211631 543.25804333
#> [6] 543.30980443
# calculate distance between points grouped_by run
d %>%
group_by(run) %>%
mutate(dists = get_dists(.))
#> Error in mutate_impl(.data, dots): Column `dists` must be length 3 (the group size) or one, not 6
# i get an error I have not managed to fix. Think it is because I have a data argument in my distance function but I do not know how to solve it.
# however it does work with a do function
d %>%
group_by(run) %>%
do(data.frame(lat = .$lat,
lon = .$lon,
dists = get_dists(.)))
#> # A tibble: 6 x 4
#> # Groups: run [2]
#> run lat lon dists
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 57.15508 -2.07886 0.000000000
#> 2 1 57.15521 -2.07886 0.014471534
#> 3 1 57.15520 -2.07887 0.015737917
#> 4 2 52.41278 -4.07803 0.000000000
#> 5 2 52.41283 -4.07806 0.005927023
#> 6 2 52.41317 -4.07858 0.057688123
# any idea what is going on?