Different Rounding Results: Vector vs Tibble

Hi. This is a newb question, but it's eating me up inside. Consider the following:

library(tidyverse)

bunch_of_years <- c(2002,1986,2017,1988,2008,1983,2008,1996,2004,2000,1994,1995,2015,1978,1974,2015,2016,1996,1983,1971,1981,1976,1998,2017,1979,1979,1993,2006,1988,1978,2013,1976,1979,1985,1985,2015,1962,1999,2015,1990,1992,1997,2018,2015,1997,2017,1982,1988,2006,2017)

Produces this vector...

bunch_of_numbers
[1] 2002 1986 2017 1988 2008 1983 2008 1996 2004 2000 1994 1995 2015 1978 1974 2015 2016 1996 1983 1971
[21] 1981 1976 1998 2017 1979 1979 1993 2006 1988 1978 2013 1976 1979 1985 1985 2015 1962 1999 2015 1990
[41] 1992 1997 2018 2015 1997 2017 1982 1988 2006 2017

And the mean...

mean(bunch_of_years)
[1] 1995.44

Great. Now, if I put those numbers into a tibble, like so...

A tibble: 50 x 2

  ID  year


1 1 2002
2 2 1986
3 3 2017
4 4 1988
5 5 2008
6 6 1983
7 7 2008
8 8 1996
9 9 2004
10 10 2000

... with 40 more rows

Look at the mean...

bunch_of_years %>% summarize(mean_year = mean(year))

A tibble: 1 x 1

mean_year

1 1995.

It is expressed as an integer, even though it says dbl. I'm following a text book that uses this code, and the mean the author derived from the tibble is 1995.44. Could someone explain why this is so, and how I can fix it? Thanks.

Welcome to the community!

As you have noted that the data type is double, even though ot it is displayed as integer, it is actually just for display. If you extract the value, then you will find the actual value without rounding.

Some options are:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

years_vector <- c(2002, 1986, 2017, 1988, 2008, 1983, 2008, 1996, 2004, 2000, 1994, 1995, 2015, 1978, 1974, 2015, 2016, 1996, 1983, 1971, 1981, 1976, 1998, 2017, 1979, 1979, 1993, 2006, 1988, 1978, 2013, 1976, 1979, 1985, 1985, 2015, 1962, 1999, 2015, 1990, 1992, 1997, 2018, 2015, 1997, 2017, 1982, 1988, 2006, 2017)
years_tibble <- tibble(ID = seq_along(along.with = years_vector),
                       year = years_vector)

summarised_years <- years_tibble %>%
    summarise(mean_year = mean(year))

# displays as integer (with a decimal symbol)
summarised_years
#> # A tibble: 1 x 1
#>   mean_year
#>       <dbl>
#> 1     1995.

# option 1
summarised_years %>% pull(mean_year)
#> [1] 1995.44

# option 2
summarised_years$mean_year
#> [1] 1995.44

Hope this helps.

2 Likes

Notice that your tibble result doesn't say 1995 but rather 1995. <- "1995 with a dot"
The dot is tibble prints telling you that the number is not an integer in a double variable,but a double with only significant digits printed.
Tibble does have display options that you can tweak, either at time of a print or more globally across your code. Here is the thread where tibble print functions were discussed and decided

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.