Group_by and summarize

Hi everybody,

I would appreciate any help with this code.
I would like to calculate the mean of the variable "share_outstanding" by each stock ticker by yearly. The code here doesn't work.

average <- input %>% group_by(ticker, year) %>%
summarize(year_share_oustanding = mean(share_outstanding))
Warning message:
In mean.default(share_outstanding) :
argument is not numeric or logical: returning NA

My data is something like that, about 300 stock tickers for ten years.
Thank you in advance.

                    year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
                             2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
                             2010L),
       share_outstanding = c("9900000", "9900000", "9900000", "9900000",
                             "9900000", "9900000", "9900000", "9900000",
                             "9900000", "9900000", "9900000", "9900000", "9900000",
                             "9900000", "9900000"),
   year_share_oustanding = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
                             NA, NA, NA),
                  ticker = as.factor(c("AAA", "AAA", "AAA", "AAA", "AAA",
                                       "AAA", "AAA", "AAA", "AAA", "AAA",
                                       "AAA", "AAA", "AAA", "AAA", "AAA")),
                    date = as.factor(c("7/15/2010", "7/16/2010", "7/19/2010",
                                       "7/20/2010", "7/21/2010", "7/22/2010",
                                       "7/23/2010", "7/26/2010", "7/27/2010",
                                       "7/28/2010", "7/29/2010", "7/30/2010",
                                       "8/2/2010", "8/3/2010", "8/4/2010"))
)

share_outstanding is a character vector; it needs to be numeric

3 Likes

I checked my data in "share_outstanding", there is no NAs value. But when I used this code to change to numeric, but there is a warning like that:

input$share_outstanding <- as.numeric(input$share_outstanding)
Warning message:
NAs introduced by coercion

The output has NAs value. Please let me know how can I fix this.

ticker  year year_share_oustanding
   <fct>  <int>                 <dbl>
 1 AAA     2010              9900000 
 2 AAA     2011              9900000 
 3 AAA     2012             10652400 
 4 AAA     2013             19800000 
 5 AAA     2014             19960324.
 6 AAA     2015             41595965.
 7 AAA     2016             51060964.
 8 AAA     2017                   NA 
 9 AAA     2018                   NA 
10 AAM     2009             10062236.

Now that you know that certain rows generate NA, look at them as they are before applying as.numeric()

library(dplyr)
BadRows <- filter(input, ticker == "AAA", year %in% c(2017, 2018))

The share_outstanding column will probably have some obvious difference to "normal" data. A thousands separator, like

51,365,890

is one possibility.

3 Likes

Thank you for all support.
Have a nice day.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.