Improving Max function

Inuraghe · July 26, 2021, 9:26am

Hi everybody!

I have a data frame like this:

Df <- data.frame(A=c(2,3,9,12,2,5,7,7,1,23,3,4,14,3,9,8,6,11,9,4),B=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2))
Df

and my code is:

#translates the column based on the counter in column B
Df %>%
  mutate(B = if_else(B == 1, "A", "B")) %>% 
  group_by(B) %>% 
  mutate(var = paste0("V",row_number())) %>% 
  pivot_wider(id_cols = B, names_from = var, values_from = A) %>% 
  rename(row_name = B)

#creates a new data frame
new <- Df %>%
  group_by(B) %>% 
  mutate(var = paste0("V",row_number())) %>% 
  pivot_wider(id_cols = B, names_from = var, values_from = A)

#extraction of the maximum value and saving in a new df
new$row_maximum = apply(new[,-1], 1, max)
complete <- new %>% rowwise() %>% mutate(Pos = which(c_across(V1:V10) == row_maximum )[1])

fin <- matrix(sample(c(0:0), 50, replace = TRUE), nrow(complete))
fin <- as.data.frame(fin)

#part to improve
tab_max <- new$row_maximum
tab_max <- as.data.frame(tab_max)
fin$V14 <- tab_max

my question is: is there any way to improve the last part? is it possible to do the last step using the position saved in complete$Pos?

gitdemont · July 26, 2021, 9:50am

What do you want exactly ?

If it helps...

tab_max = by(Df$A, Df$B, max)
fin <- as.data.frame(matrix(rep(0, 50), nrow = nlevels(as.factor(Df$B)))) # why sample(c(0:0), 50, replace = TRUE) ?
fin$V14 <- tab_max # why at V14 ?

Inuraghe · July 26, 2021, 1:05pm

Ok, the proposed solution works better, thank you!
My end goal is to prepare a dataset to do machine learning, so I want to put the max values all in the same column.
I have another question: how can i take the value before and after the maximum and get a new df as follows?
Schermata 2021-07-26 alle 15.03.02

So, have all the maximum values in the middle column, the value before it in the column before and the value after the maximum value in the column after.

gitdemont · July 26, 2021, 1:32pm

You should decide what to do for several cases. For instance, for the value in row1 column 3 of your screenshot

vals <- by(Df$A, Df$B, FUN = function(x) {
  # here you determine the position of the maximum are you sure there will not have any NA
  pos = which.max(x) # or pos = which.max(x, na.rm = TRUE)
  # what if there are several times the same maximal value ? should you concider pos = pos[1]
  pos_plus_minus1 = c(pos-1, pos, pos+1)
  # what if pos is the 1st one or the last one ?
  x[pos_plus_minus1]
})
do.call(rbind, vals)

Inuraghe · July 26, 2021, 1:42pm

this is perfect, it works perfectly! thank you very much

system · August 2, 2021, 1:43pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.