Cleaning dataset

Inuraghe · September 6, 2021, 10:33am

Hi everybody!

I have a data frame like this:

db_class <- data.frame(A=c(2,3,9,12,2,5,7,7,1,23,3,4,14,3,9,8,6,11,9,4),B=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
db_class

vals <- by(db_class$A, db_class$B, FUN = function(x) {
  pos = which.max(x) 
  pos_plus_minus = c(pos-3, pos-2, pos-1, 
                     pos, 
                     pos+1, pos+2, pos+3)
  x[pos_plus_minus]
})

db_finale <-do.call(rbind, vals)
db_finale <- as.data.frame(db_finale)

final<- db_finale[db_finale$V4<20,]

final

This code is slow and I need something to fast. I would like to know if it would be possible to move the code "final" in the part of the program where the translation of the dataset happens "vals".

Thank you!

Leon · September 6, 2021, 1:18pm

Could you briefly illustrate your expected output?

Inuraghe · September 6, 2021, 1:57pm

My output is final, my idea is to change the position of the "final" code inside vals.

StatSteph · September 6, 2021, 2:31pm

I think you want V4 to be whatever the maximum is for the group. If that's the case, I think this solution works.

db_class <- data.frame(A=c(2,3,9,12,2,5,7,7,1,23,3,4,14,3,9,8,6,11,9,4),
                       B=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2))
db_class
#>     A B
#> 1   2 1
#> 2   3 1
#> 3   9 1
#> 4  12 1
#> 5   2 1
#> 6   5 1
#> 7   7 1
#> 8   7 1
#> 9   1 1
#> 10 23 1
#> 11  3 2
#> 12  4 2
#> 13 14 2
#> 14  3 2
#> 15  9 2
#> 16  8 2
#> 17  6 2
#> 18 11 2
#> 19  9 2
#> 20  4 2
vals <- by(db_class$A, db_class$B, FUN = function(x) {
  pos = which.max(x) 
  # pos_plus_minus <- pos + (-3:3)
  pos_plus_minus = c(pos-3, pos-2, pos-1, 
                     pos, 
                     pos+1, pos+2, pos+3)
  print(pos_plus_minus)
  x[pos_plus_minus]
})
#> [1]  7  8  9 10 11 12 13
#> [1] 0 1 2 3 4 5 6


#################
library(tidyverse)

vals2 <-
  db_class %>%
  group_by(B) %>%
  mutate(
    pos=row_number()-which.max(A) #relative position to maximum
    ) %>%
  ungroup() %>%
  filter(pos %in% -3:3) %>% #keep only those within 3 of maximum
  mutate(
    Name=str_c("V", pos+4) #make the name 
  ) %>%
  pivot_wider(id_cols = B, names_from=Name, values_from=A)

vals
#> db_class$B: 1
#> [1]  7  7  1 23 NA NA NA
#> ------------------------------------------------------------ 
#> db_class$B: 2
#> [1]  3  4 14  3  9  8
vals2
#> # A tibble: 2 x 8
#>       B    V1    V2    V3    V4    V5    V6    V7
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     7     7     1    23    NA    NA    NA
#> 2     2    NA     3     4    14     3     9     8

^{Created on 2021-09-06 by the reprex package (v2.0.1)}

Inuraghe · September 7, 2021, 6:27am

ok, but if I want to keep only the rows where in column v4 are present values > 20?

Inuraghe · September 7, 2021, 8:35am

Using this code I have this result:
Schermata 2021-09-07 alle 10.32.24
The columns are swapped, the column with the maximum is put first, then the next three and then the three previous ones. How can this order be changed?

StatSteph · September 7, 2021, 12:11pm

Here's some code to 1) Change order of columns and 2) filter to just rows where V4> 20

vals2 %>%
  select(B, V1:V7) %>% # changes order
  filter(V4 > 20) # filter to rows greater than 20
#> # A tibble: 1 x 8
#>       B    V1    V2    V3    V4    V5    V6    V7
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     7     7     1    23    NA    NA    NA

system · September 14, 2021, 12:12pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.