How to find the max value of a column for changing set of rows and get the corresponding value of another column?

rstudio

#1

I am having the stock price data of IBM.
Code:
library(tseries)
IBM<-tail(get.hist.quote("IBM"),250)

I inserted a new column containing the Serial No. using
Code:
IBM$Serial <- 1:nrow(IBM)
[1] "Open" "High" "Low" "Close" "Serial"

Now, I would like to create a new column named "Max_Vals" and get from the 100th row the corresponding column - Serial No. Value when Close was max.


#3

It's not entirely clear to me, what you are trying to accomplish. It is much easier to get help, if you supply a reproducible example also please look into code formatting using backticks.

Perhaps you're looking for something like this:

library("tidyverse")
set.seed(182635)
d = tibble(Open  = rnorm(n = 250, mean = 100),
           High  = rnorm(n = 250, mean = 100),
           Low   = rnorm(n = 250, mean = 100),
           Close = rnorm(n = 250, mean = 100))
d = d %>% mutate(Serial = 1:nrow(.))
d
# A tibble: 250 x 5
    Open  High   Low Close Serial
   <dbl> <dbl> <dbl> <dbl>  <int>
 1 101.   99.3 100.  100.       1
 2 101.  102.   98.4  98.2      2
 3 101.  100.  100.  101.       3
 4 101.  102.  103.   99.9      4
 5  98.2 102.  101.   98.6      5
 6 101.  100.   98.2 102.       6
 7  98.4 100.   99.4 101.       7
 8 102.   98.5 101.   99.2      8
 9  99.4  99.7 101.  101.       9
10 101.   99.8 100.0 101.      10
# ... with 240 more rows
d %>% summarise(Max_Vals = max(Close), Serial = which.max(Close))
# A tibble: 1 x 2
  Max_Vals Serial
     <dbl>  <int>
1     103.    149

More information on how to get help, can be found here :slightly_smiling_face:


#4

Thank you for your quick reply,
My actual requirement is:
Suppose there are 1000 rows of IBM Historical Stock Price Data, I want to create a new column(starting from the 100th row) named "Max_Val_Serial_No" which gives the Serial No.Value of the Max closing price for the previous 100 days for the entire column.

Note: Every row in the new column should take its previous 100 days closing price for calculating the max closing price and give its corresponding serial no. in the new column.


#5

Like so?

# Load libraries
library("tidyverse")

# Set seed for reproducibility
set.seed(182635)

# Define function for looking back 'size' positions in vector
which_max_in_window = function(x, size){
  out = vector(mode = "numeric", length = length(x))
  win = seq(1, size)
  out[win] = which.max(x[win])
  for( i in seq(size + 1, length(x)) ){
    first  = i - size + 1
    last   = i
    win    = seq(first, last)
    out[i] = which.max(x[win]) + first - 1
  }
  return(out)
}

# Generate dummy data
n = 1000
d = tibble(Open  = rnorm(n = n, mean = 100),
           High  = rnorm(n = n, mean = 100),
           Low   = rnorm(n = n, mean = 100),
           Close = rnorm(n = n, mean = 100))

# Create new variable with max indices for previous 100 elements of Close vector
d = d %>% mutate(Serial = seq(1, nrow(.)),
                 Max_Val_Serial_No = which_max_in_window(x = Close, size = 100))

# Get tibble of top closing values in window
d_max = tibble(x = d %>% select(Max_Val_Serial_No) %>% distinct %>% pull,
               y = d %>% select(Close) %>% slice(x) %>% pull)

# Visualise
d %>%
  ggplot(aes(x = Serial, y = Close)) +
  geom_line() +
  geom_point(data = d_max, aes(x = x, y = y), inherit.aes = FALSE, pch = 'X',
             size = 5, colour = "tomato") +
  scale_x_continuous(breaks = seq(0, nrow(d), by = 100)) +
  guides(colour = FALSE) +
  theme_bw() +
  ggtitle("Top Close Values Within Sliding Windows of size 100")

Please do run some tests, to see if you get the expected output - No guarantees given :wink:


#6

Sorry, I am getting an error.
'''
Code:
library(tseries)
IBM<-tail(get.hist.quote("IBM"),250)
IBM<-tail(get.hist.quote("IBM"),1000)
library(tidyverse)
which_max_in_window = function(x, size){

  • out = vector(mode = "numeric", length = length(x))
    
  • win = seq(1, size)
    
  • out[win] = which.max(x[win])
    
  • for( i in seq(size + 1, length(x)) ){
    
  •     first  = i - size + 1
    
  •     last   = i
    
  •     win    = seq(first, last)
    
  •     out[i] = which.max(x[win]) + first - 1
    
  • }
    
  • return(out)
    
  • }
    IBM = IBM %>% mutate(Serial = seq(1, nrow(.)),
  •              Max_Val_Serial_No = which_max_in_window(x = Close, size = 100))
    

Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "zoo"
'''


#7

Please insert your code using the following formatting (except for #), it makes it a lot easier to read:

#```{r}
#```

The dummy data we created in my example has the following class and structure:

class(d)
[1] "tbl_df"     "tbl"        "data.frame"
str(d)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	1000 obs. of  4 variables:
 $ Open : num  101.1 101.3 101 100.9 98.2 ...
 $ High : num  102.7 99.2 101.3 101.3 99.6 ...
 $ Low  : num  99.7 98.8 100.8 99.3 101 ...
 $ Close: num  99.6 97.4 99.6 98.8 100.9 ...

The error message is related to the zoo class object you have. There is a stackoverflow discussion here on that particular error. In other words for the function I wrote to work, you will have to convert your zoo object to a data frame.


#8

Sorry for the formatting error. Thanks a lot for your timely help, Sir.


#9

How to create a multiple Line graphs below each other ?
1st Graph:
x axis - Time Period
y axis - Closing Price
2nd Graph:
x axis - Serial
y axis - Max_Value_Serial_No and Closing Price


#10

Briefly:

library("tidyverse")
tibble(my_x_var, my_y_var) %>%
  ggplot(aes(x = my_x_var, y = my_y_var)) +
  geom_line() + 
  theme_bw()

Garrett Grolemund and Hadley Wickham has written this really good book on R for Data Science.

Based on your question reg. x/y scatter plots, I would highly recommend, that you to go through this specific section on data visualisation using ggplot

Try out a few things and then come back here with a suggestion of your own for the plots and we will help you on your way :slightly_smiling_face: