Cleaning dataset

Hi everybody!

I have a data frame like this:

db_class <- data.frame(A=c(2,3,9,12,2,5,7,7,1,23,3,4,14,3,9,8,6,11,9,4),B=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
db_class
vals <- by(db_class$A, db_class$B, FUN = function(x) {
  pos = which.max(x) 
  pos_plus_minus = c(pos-3, pos-2, pos-1, 
                     pos, 
                     pos+1, pos+2, pos+3)
  x[pos_plus_minus]
})

db_finale <-do.call(rbind, vals)
db_finale <- as.data.frame(db_finale)

final<- db_finale[db_finale$V4<20,]

final

This code is slow and I need something to fast. I would like to know if it would be possible to move the code "final" in the part of the program where the translation of the dataset happens "vals".

Thank you!

Could you briefly illustrate your expected output?

My output is final, my idea is to change the position of the "final" code inside vals.

I think you want V4 to be whatever the maximum is for the group. If that's the case, I think this solution works.

db_class <- data.frame(A=c(2,3,9,12,2,5,7,7,1,23,3,4,14,3,9,8,6,11,9,4),
                       B=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2))
db_class
#>     A B
#> 1   2 1
#> 2   3 1
#> 3   9 1
#> 4  12 1
#> 5   2 1
#> 6   5 1
#> 7   7 1
#> 8   7 1
#> 9   1 1
#> 10 23 1
#> 11  3 2
#> 12  4 2
#> 13 14 2
#> 14  3 2
#> 15  9 2
#> 16  8 2
#> 17  6 2
#> 18 11 2
#> 19  9 2
#> 20  4 2
vals <- by(db_class$A, db_class$B, FUN = function(x) {
  pos = which.max(x) 
  # pos_plus_minus <- pos + (-3:3)
  pos_plus_minus = c(pos-3, pos-2, pos-1, 
                     pos, 
                     pos+1, pos+2, pos+3)
  print(pos_plus_minus)
  x[pos_plus_minus]
})
#> [1]  7  8  9 10 11 12 13
#> [1] 0 1 2 3 4 5 6


#################
library(tidyverse)

vals2 <-
  db_class %>%
  group_by(B) %>%
  mutate(
    pos=row_number()-which.max(A) #relative position to maximum
    ) %>%
  ungroup() %>%
  filter(pos %in% -3:3) %>% #keep only those within 3 of maximum
  mutate(
    Name=str_c("V", pos+4) #make the name 
  ) %>%
  pivot_wider(id_cols = B, names_from=Name, values_from=A)

vals
#> db_class$B: 1
#> [1]  7  7  1 23 NA NA NA
#> ------------------------------------------------------------ 
#> db_class$B: 2
#> [1]  3  4 14  3  9  8
vals2
#> # A tibble: 2 x 8
#>       B    V1    V2    V3    V4    V5    V6    V7
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     7     7     1    23    NA    NA    NA
#> 2     2    NA     3     4    14     3     9     8

Created on 2021-09-06 by the reprex package (v2.0.1)

1 Like

ok, but if I want to keep only the rows where in column v4 are present values > 20?

Using this code I have this result:
Schermata 2021-09-07 alle 10.32.24
The columns are swapped, the column with the maximum is put first, then the next three and then the three previous ones. How can this order be changed?

Here's some code to 1) Change order of columns and 2) filter to just rows where V4> 20

vals2 %>%
  select(B, V1:V7) %>% # changes order
  filter(V4 > 20) # filter to rows greater than 20
#> # A tibble: 1 x 8
#>       B    V1    V2    V3    V4    V5    V6    V7
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     7     7     1    23    NA    NA    NA
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.