Loop to sum an if/else result from specific columns

Hello!

I have a dataset indices where each observation contains a character rating for 4 different variables. I want to assign the character rating in each column a numeric value using an ifelse statement and then sum the result in a new column.

To clarify, each row has four indexes with a "Good/Fair/Poor/Missing" rating, and I want to assign a 1 for each Good and a 0 for everything else. Then, I want to add up the result in a column called indices$Total

This is what I have tried:

for(i in 1:ncol(indices[,5:8])){
  indices$Total[i] <- sum(ifelse(indices$i=="Good", 1,0)) 
}

But, of course, it didn't work because I'm sure I'm not writing the appropriate syntax, or I'm not defining i correctly or something. Could someone help me achieve this? Here is a subset of my data:

indices <- structure(list(UID = c("155230", "155231", "155232", "155233", 
"155233", "155233"), SITE_ID = c("LHLEC-012", "LHLEC-015", "LHLEC-011", 
"LHLEC-003", "LHLEC-003", "LHLEC-003"), LATITUDE = c(42.0855, 
42.1436, 42.1203, 42.1725, 42.1725, 42.1725), LONGITUDE = c(-83.1216, 
-83.1177, -83.1328, -83.1316, -83.1316, -83.1316), Water.Quality.Index = c("Fair", 
"Poor", "Fair", "Good", "Good", "Good"), SedChemindex = c("Fair", 
"Good", "Fair", "Good", "Good", "Good"), SEDTOX_INDEX = c("Fair", 
"Good", "Fair", "Good", "Good", "Fair"), OTI.Classification = c("Missing", 
"Fair", "Poor", "Fair", "Fair", "Fair")), row.names = c(NA, 6L
), class = "data.frame")

Thanks so much!

You typically do not need to use for loops in R for such manipulations. Functions are vectorized so they will work on entire vectors or columns of data. Here is a solution using the dpylr package. I made a new data frame named indices2 but you could just as well overwrite the original data. I also only display selected columns from indices2 to keep the display width reasonable.

indices <- structure(list(UID = c("155230", "155231", "155232", "155233", 
                                  "155233", "155233"), 
                          SITE_ID = c("LHLEC-012", "LHLEC-015", "LHLEC-011", 
                                      "LHLEC-003", "LHLEC-003", "LHLEC-003"), 
                          LATITUDE = c(42.0855, 42.1436, 42.1203, 42.1725, 42.1725, 42.1725), 
                          LONGITUDE = c(-83.1216, -83.1177, -83.1328, -83.1316, -83.1316, -83.1316), 
                          Water.Quality.Index = c("Fair", "Poor", "Fair", "Good", "Good", "Good"), 
                          SedChemindex = c("Fair", "Good", "Fair", "Good", "Good", "Good"), 
                          SEDTOX_INDEX = c("Fair", "Good", "Fair", "Good", "Good", "Fair"), 
                          OTI.Classification = c("Missing", "Fair", "Poor", "Fair", "Fair", "Fair")), 
                     row.names = c(NA, 6L
                     ), class = "data.frame")
library(dplyr)

indices2 <- indices |> mutate(across(.cols = 5:8, 
                                      .fns = ~ifelse(. == "Good", 1, 0))) |> 
  rowwise() |> 
  mutate(Total = sum(c_across(cols = 5:8)))
select(indices2, c(1,5:9))
#> # A tibble: 6 x 6
#> # Rowwise: 
#>   UID    Water.Quality.Index SedChemindex SEDTOX_INDEX OTI.Classification Total
#>   <chr>                <dbl>        <dbl>        <dbl>              <dbl> <dbl>
#> 1 155230                   0            0            0                  0     0
#> 2 155231                   0            1            1                  0     2
#> 3 155232                   0            0            0                  0     0
#> 4 155233                   1            1            1                  0     3
#> 5 155233                   1            1            1                  0     3
#> 6 155233                   1            1            0                  0     2

Created on 2022-03-16 by the reprex package (v2.0.1)

1 Like

I don't know who you are, you're like the Friendly Neighborhood Spiderman of all my data formatting questions, but THANK YOU endlessly. Where do you learn all of these functions? I'd love expand my R knowledge of data formatting functions so I can do it without coming to RStudio all the time.

Here is a great resource:

1 Like

Thank you so much! Does it matter if I use the pipe %>% or |>? Can I use them interchangeably?

I am not aware of any difference between them. I use |> because that is what is produced by the CTRL Shift M shortcut in my version of RStudio.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.