NAs Showing Up in Second Mean Column

smithlc · May 8, 2022, 12:17am

I have a data set that can have more than one date per month. I am trying to aggregate them and get the mean so that there will only be one per month. I have tried several aggregate codes and have gotten NAs every time. The code and the outcome is at the bottom. This is only the first data set I have to do this with. I will also put the others I have tried.

  head(pa_new1)
     head(pa_new1)
  Start.Date   End.Date Approving Disapproving Unsure.NoData
1 1974-08-02 1974-08-05        24           66            10
2 1974-07-12 1974-07-15        24           63            13
3 1974-06-28 1974-07-01        28           58            14
4 1974-06-21 1974-06-24        26           61            13
5 1974-05-31 1974-06-03        28           61            11
6 1974-05-17 1974-05-20        26           61            13
        Date year month
1 1974-08-02 1974    08
2 1974-07-12 1974    07
3 1974-06-28 1974    06
4 1974-06-21 1974    06
5 1974-05-31 1974    05
6 1974-05-17 1974    05

   > pa_new1 %>%
+   select(month, year) %>%
+   group_by(year) %>%
+   summarize(mean(month))
# A tibble: 53 x 2
   year  `mean(month)`
   <chr>         <dbl>
 1 1969             NA
 2 1970             NA
 3 1971             NA
 4 1972             NA
 5 1973             NA
 6 1974             NA
 7 1975             NA
 8 1976             NA
 9 1977             NA
10 1978             NA
# ... with 43 more rows
There were 50 or more warnings (use warnings() to see the first 50)

##just has second column as NAs
aggregate(month~year, pa_new1, mean)

##just has second column as NAs
aggregate(month~year, pa_new1, mean, na.rm=TRUE)

smithlc · May 8, 2022, 12:35am

I actually was able to get the means for the aggregate data successfully but now I don't know how to proceed in order to use the mean data I have now going forward.

andresrcs · May 8, 2022, 12:42am

Is this what you mean?

library(dplyr)

# Sample data on a copy/paste friendly format, replace this with your own data frame
pa_new1 <- data.frame(
  stringsAsFactors = FALSE,
        Start.Date = c("1974-08-02","1974-07-12",
                       "1974-06-28","1974-06-21","1974-05-31","1974-05-17"),
          End.Date = c("1974-08-05","1974-07-15",
                       "1974-07-01","1974-06-24","1974-06-03","1974-05-20"),
         Approving = c(24, 24, 28, 26, 28, 26),
      Disapproving = c(66, 63, 58, 61, 61, 61),
     Unsure.NoData = c(10, 13, 14, 13, 11, 13),
              Date = c("1974-08-02","1974-07-12",
                       "1974-06-28","1974-06-21","1974-05-31","1974-05-17"),
              year = c("1974", "1974", "1974", "1974", "1974", "1974"),
             month = c("08", "07", "06", "06", "05", "05")
)

# Relevant code
pa_new1 %>% 
    group_by(year, month) %>% 
    summarise(across(where(is.numeric), mean))
#> `summarise()` has grouped output by 'year'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 5
#> # Groups:   year [1]
#>   year  month Approving Disapproving Unsure.NoData
#>   <chr> <chr>     <dbl>        <dbl>         <dbl>
#> 1 1974  05           27         61            12  
#> 2 1974  06           27         59.5          13.5
#> 3 1974  07           24         63            13  
#> 4 1974  08           24         66            10

^{Created on 2022-05-07 by the reprex package (v2.0.1)}

Note: Next time please provide a proper REPRoducible EXample (reprex) illustrating your issue.

system · May 29, 2022, 12:42am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.