Count columns values across many filters

M_AcostaCH · January 9, 2023, 7:54am

Hi comunity

Im have many NUMERO_INT. I need get the totals values, like this:

 INUM <- data.frame(structure(list(NU_STOCK = c("S  517", "S  482", "S  470", "S91450", 
                                                "S92313A", "S91634", "S90243", "S97158", "S96737", "S96713", 
                                                "S96712", "S96711", "S96710", "S34856", "S34855", "S34854", "S34859", 
                                                "S34852", "S34851", "S34850", "S34849", "S34848", "S34847"), 
                                   GNUMBER = c("G51390", "G51389", NA, NA, "G 2313A", "G 1634", 
                                               "G  243", "G 7158", NA, "G 6713", NA, "G 6711", "G 6710", 
                                               "G52466", "G52465", NA, "G52463", NA, "G52461", "G52460", 
                                               "G52459", "G52458", "G52457"), 
                                   NUMERO_INT = c("I-001", "I-001", "I-001", "I-001", "I-001", "I-001", "I-001", "I-010", "I-010", 
                                                  "I-010", "I-010", "I-010", "I-010", "I-658", "I-658", "I-658", 
                                                  "I-658", "I-658", "I-658", "I-658", "I-658", "I-658", "I-658"
                                               ), FAO = c("2005", NA, "2005", NA, NA, NA, NA, "2001", NA, 
                                                          "1999", "2001", "1999", NA, "2014", "2014", "2014", "2014", 
                                                          "2014", NA, NA, "2014", "2014", "2014"),
                                   LOST = c(NA, NA, NA, NA, "2000", "2000", "2012", NA, "2001", NA, NA, NA, "2001", 
                                          NA, NA, NA, NA, "2022", NA, "2001", NA, NA, NA), 
                                   PROCED1 = c("USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", 
                                               "USA", "USA", "USA", "COL", "COL", "COL", "COL", "COL", "COL", 
                                               "COL", "COL", "COL", "COL")), class = c("tbl_df", "tbl", "data.frame"), 
                              row.names = c(NA, -23L)))


I_001 <- INUM |> 
   select(NU_STOCK,GNUMBER, NUMERO_INT, FAO, LOST) |> 
   filter(NUMERO_INT=='I-001') |>   # other NUMERO_INT has the same format 
   group_by(NUMERO_INT) %>%
   summarise(across(.fns = ~sum(!is.na(.))))

 # # A tibble: 1 × 5
 # NUMERO_INT        NU_STOCK GNUMBER  FAO   LOST
 #      <chr>         <int>   <int>   <int> <int>
 #   1 I-001             7       5     2     3

Im want to obtaind this for each NUMERO_INT.

 # # A tibble: 3 × 8
 #        NU_STOCK GNUMBER  NUMERO_INT    FAO  FAO_NA  LOST LOST_NA PROCED1
 #          <dbl>  <dbl>      <chr>         <dbl>  <dbl> <dbl> <dbl> <chr>  
 # 1        7       5         I-001          2      5     3       4   USA    
 # 2        6       4         I-010          4      2     2       4   USA    
 # 3       10       8         I-658          8      2     2      10   COL

I have 500 NUMERO_INT with the same columns. The idea is get this summaries table for all them.

Tnks!

nirgrahamuk · January 9, 2023, 12:07pm

simply remove the filter line; the group_by will do the work

M_AcostaCH · January 9, 2023, 7:57pm

Hi, Im remove this, see the output. Is close for final output.

 # NUMERO_INT NU_STOCK GNUMBER   FAO  LOST
 # <chr>         <int>   <int> <int> <int>
 # 1 I-001             7       5     2     3
 # 2 I-010             6       4     4     2
 # 3 I-658            10       8     8     2

I need add this colums: FAO_NA ,LOST_NA, PROCED1
Some ideas?

nirgrahamuk · January 10, 2023, 11:58am

INUM |> 
  select(PROCED1,NU_STOCK,GNUMBER, NUMERO_INT, FAO, LOST) |> 
  group_by(PROCED1,NUMERO_INT) %>%
  summarise(across(.fns = ~sum(!is.na(.)),
                   .names = "{col}_NA")) |> ungroup()

M_AcostaCH · January 10, 2023, 4:27pm

M_AcostaCH:

INUM |> 
   select(NU_STOCK,GNUMBER, NUMERO_INT, FAO, LOST) |> 
   filter(NUMERO_INT=='I-001') |>   # other NUMERO_INT has the same format 
   group_by(NUMERO_INT) %>%
   summarise(across(.fns = ~sum(!is.na(.))))

I'm really appreciate your help.
I'm commet you the conflicts problem with summarise() and acroos(), but the solution was use this in each case:

dplyr::summarise()
dplyr::across()

Tnks!

M_AcostaCH · January 10, 2023, 4:51pm

Im check the script but I see that the only show the data !NA, is very well. But in FAO and LOST, I have NA values that I need count, like this table:

 # # A tibble: 3 × 8
 #        NU_STOCK GNUMBER  NUMERO_INT    FAO    FAO_NA  LOST LOST_NA PROCED1
 #        <dbl>  <dbl>      <chr>         <dbl>  <dbl> <dbl> <dbl> <chr>  
 # 1        7       5         I-001          2      5     3       4   USA    
 # 2        6       4         I-010          4      2     2       4   USA    
 # 3       10       8         I-658          8      2     2      10   COL

With your code Im try with this but get an error:

INUM |> 
  select(PROCED1,NU_STOCK,GNUMBER, NUMERO_INT, FAO, LOST) |> 
  group_by(PROCED1,NUMERO_INT) %>%
  dplyr::summarise(dplyr::across(.fns =list(~sum(!is.na(.)),~sum(is.na(.))) ,
                   .names = "{col}_NA")) |> ungroup()

nirgrahamuk · January 10, 2023, 4:59pm

you want to count how many NA values, and how many not NA's values are in FAO, and Lost by your groups ?

INUM |> 
  select(PROCED1,NU_STOCK,GNUMBER, NUMERO_INT, FAO, LOST) |> 
  group_by(PROCED1,NUMERO_INT) %>%
  summarise(across(.cols =c(FAO,LOST),
                   .fns = ~sum(!is.na(.)),
                   .names = "{col}_notNA"),
            across(.cols =c(FAO,LOST),
                   .fns = ~sum(is.na(.)),
                   .names = "{col}_NA")) |> ungroup()

M_AcostaCH · January 10, 2023, 6:35pm

Yeah, Im want this, run excellent.

Tnks! men.

system · January 17, 2023, 6:35pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.