How can I insert my loop into a function?

My goal: count the number of times "A" appears in df - it will always be in col1 or col2 (never on the same row).
Then, loop over and repeat for all other letters.

library(reshape2)

#Make a dataframe with just two columns of characters
df <- data.frame("col1"= c("A", "B", "C", "D", "E", "F", "G", "A"), "col2"=c("Q", "A", "S", "Z", "A", "C", "F", "X"))

df
  col1 col2
1    A    Q
2    B    A
3    C    S
4    D    Z
5    E    A
6    F    C
7    G    F
8    A    X

I made this loop, which seems to work for me. The process is: stack col1 and col2 on top of each other, get the unique characters, loop over df to see how many times each character is found, then store the results in newdf:

  newdf = NULL

    #For any unique characters between col1 and col2, count how many times they appear in df
    for(i in unique(stack(df)$value)){
      newdf <- rbind(newdf,data.frame("unique_char"=i, "number_hits"=nrow(df[df[1]==i | df[2]==i,])))
    }

Result looks good:

newdf
   unique_char number_hits
1            A           4
2            B           1
3            C           2
4            D           1
5            E           1
6            F           2
7            G           1
8            Q           1
9            S           1
10           Z           1
11           X           1

My question: How can I insert this loop into a function so that I can perform this on any dataframe? I tried the following...

myfun <- function(a){
  newdf = NULL

    #For any unique characters between col1 and col2, count how many times they appear in df
    for(i in unique(stack(a)$value)){
      newdf <- rbind(newdf,data.frame("unique_char"=i, "number_hits"=nrow(a[a[1]==i | a[2]==i,])))
    }
}

But it doens't work. If I run myfun(df) nothing happens.

Your function does not return anything. Try

myfun <- function(a){
  newdf = NULL
  
  #For any unique characters between col1 and col2, count how many times they appear in df
  for(i in unique(stack(a)$value)){
    newdf <- rbind(newdf,data.frame("unique_char"=i, "number_hits"=nrow(a[a[1]==i | a[2]==i,])))
  }
  newdf
}

You could wrap that last newdf in the return() function but that is not required.

1 Like
suppressPackageStartupMessages({
  library(magrittr)
})
# lower case df is the name of a function, like data, and its a good idea
# to avoid both, because some operations treat "df" as the function, 
# rather than the data frame, and fail

DF <- data.frame("col1"= c("A", "B", "C", "D", "E", "F", "G", "A"), "col2"=c("Q", "A", "S", "Z", "A", "C", "F", "X"))

DF
#>   col1 col2
#> 1    A    Q
#> 2    B    A
#> 3    C    S
#> 4    D    Z
#> 5    E    A
#> 6    F    C
#> 7    G    F
#> 8    A    X

count_unique <- function(x) c(x[1][[1]],x[2][[1]]) %>% table()

count_unique(DF)
#> .
#> A B C D E F G Q S X Z 
#> 4 1 2 1 1 2 1 1 1 1 1

DF <- data.frame(col1 = sample(LETTERS,8,replace = TRUE), 
                 col2 = sample(LETTERS,8,replace = TRUE))

DF
#>   col1 col2
#> 1    X    I
#> 2    Y    O
#> 3    K    F
#> 4    L    Q
#> 5    M    P
#> 6    Z    O
#> 7    Q    G
#> 8    M    S

count_unique(DF)
#> .
#> F G I K L M O P Q S X Y Z 
#> 1 1 1 1 1 2 2 1 2 1 1 1 1

Created on 2021-01-15 by the reprex package (v0.3.0.9001)

1 Like

Do you mean

newdf <- myfunc(df)

Note that the newdf inside of the function is a different object than the one to which the output of myfunc is assigned.

1 Like

Thanks this worked. I guess I just was thinking I could have nothing in my environment except df and the function, then run it with myfun(df) and the result would be generation of newdf.

I did not realize what I needed to type was newdf<- myfun(df)

This solution does work for me as well, thanks! I also was able to fix my loop by printing it to a data frame outside the loop, which turned out to be my error.

Also - i'll reword df thanks for that tip.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Thanks for your response! I tried this as well but it just prints newdf

How can I make it actually generate newdf ? Such that if I tried the function on a different dataframe (let's call it mynewdf) using myfun(mynewdf), now newdf is replaced with the updated result?