Loop with increment error

Dear All,

I am trying to capture the first and last values of number of shares for companies. The data looks like:

COMPANY_NUMBER YEAR_END_DATE NUMBER_OF_YEARS  NUMBER_OF_SHARES   FIRST_LAST_SHARES
  #0070837                  31/12/2011                        6                                           68.03                            0
#0070837                    31/12/2012                        6                                          54.05                             0
#0070837                    31/12/2013                        6                                           24.94                            0
#0070837                    31/12/2014                        6                                          20.58                             0
#0070837                    31/12/2015                       6                                          16.37                              0
#0070837                    31/12/2016                       6                                           22.68                             0
1000403                      31/12/2007                       4                                             7.7                               0
1000403                      31/12/2008                       4                                              6.2                              0
1000403                      31/12/2009                       4                                           4.89                              0
1000403                      31/12/2010                      4                                            4.55                              0
1000608                      31/12/2007                      3                                         285.71                             0
1000608                      31/12/2008                      3                                          454.55                             0
1000608                      31/12/2009                      3                                          555.56                             0

I want to capture the number of shares for the first and the last year only. To do this I have a loop which increments by number of years. Number of years is at least three.

The code I have written is

for(i in seq(from = 1, to = 50, by = (PLC2_Not_Null2$NUMBER_OF_YEARS[i]-1)))
{
for (j in 1:PLC2_Not_Null2$NUMBER_OF_YEARS[i]){
if (j ==1 )
{PLC2_Not_Null2$FIRST_LAST_SHARES[j]=PLC2_Not_Null2$NUMBER_OF_SHARES[j]
} else if (PLC2_Not_Null2$NUMBER_OF_YEARS[j] == PLC2_Not_Null2$NUMBER_OF_YEARS[i] )
{PLC2_Not_Null2$PLC2_Not_Null2$FIRST_LAST_SHARES[i]=PLC2_Not_Null2$NUMBER_OF_SHARES[i]
} else {PLC2_Not_Null2$first_last_shares[i] = 0}

}
}

The error I get is

Error in seq.default(from = 1, to = 50, by = (PLC2_Not_Null2$NUMBER_OF_YEARS[i] - :
'by' must be of length 1

by value will never be zero so I don't understand what have I done wrong?

Thanks in advance fr your help.

My comfort with R is growing thanks to all of you.

Kind regards

Ahson

I do not know what variable PLC2_Not_Null2$NUMBER_OF_YEARS is. However, the 'by' argument in seq() must be a number, the number you want to increment your sequence by. For example, if you want to count from 1 to 50 by 1s, then:

> seq(from =  1,  to = 50,  by =  1)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

...
The dplyr package can help group by company and pull out the first and last vlues. Here is an example:

library(dplyr)

# fake data
df <- data.frame(
 company = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
 year = c(2018, 2019, 2020, 2018, 2019, 2020,2018, 2019, 2020),
 shares = c(1, 2, 3, 98, 99, 100, 397, 398, 399)
)

df %>% 
 group_by(company) %>% 
 arrange(year) %>%  # order by year, do not skip!
 summarise(
  first_year_shares = first(shares), # the functions to get the first and last by group
  last_year_shares = last(shares)
 )
# A tibble: 3 x 3
  company first_year_shares last_year_shares
  <chr>               <dbl>            <dbl>
1 a                       1                3
2 b                      98              100
3 c                     397              399

Would this do the trick for you?

# Load libraries ----------------------------------------------------------
library("tidyverse")
library("lubridate")


# Define data -------------------------------------------------------------
d <- tribble(
  ~COMPANY_NUMBER, ~YEAR_END_DATE, ~NUMBER_OF_YEARS, ~NUMBER_OF_SHARES, ~FIRST_LAST_SHARES,
  "#0070837", "31/12/2011", 6, 68.03, 0,
  "#0070837", "31/12/2012", 6, 54.05, 0,
  "#0070837", "31/12/2013", 6, 24.94, 0,
  "#0070837", "31/12/2014", 6, 20.58, 0,
  "#0070837", "31/12/2015", 6, 16.37, 0,
  "#0070837", "31/12/2016", 6, 22.68, 0,
  "1000403", "31/12/2007", 4, 7.7, 0,
  "1000403", "31/12/2008", 4, 6.2, 0,
  "1000403", "31/12/2009", 4, 4.89, 0,
  "1000403", "31/12/2010", 4, 4.55, 0,
  "1000608", "31/12/2007", 3, 285.71, 0,
  "1000608", "31/12/2008", 3, 454.55, 0,
  "1000608", "31/12/2009", 3, 555.56, 0
) %>% 
  mutate(YEAR_END_DATE = dmy(YEAR_END_DATE))


# Wrangle data ------------------------------------------------------------
d %>%
  group_by(COMPANY_NUMBER) %>% 
  summarise(n_first = NUMBER_OF_SHARES[YEAR_END_DATE == min(YEAR_END_DATE)],
            n_last = NUMBER_OF_SHARES[YEAR_END_DATE == max(YEAR_END_DATE)])

Yielding:

# A tibble: 3 x 3
  COMPANY_NUMBER n_first n_last
* <chr>            <dbl>  <dbl>
1 #0070837          68.0  22.7 
2 1000403            7.7   4.55
3 1000608          286.  556.  

Hope it helps :slightly_smiling_face: