I have a dataframe df
where two columns are characters, and a third is numeric. Example:
df <- data.frame("col1"= c("A", "B", "C", "D", "E", "F", "G", "A"), "col2"=c("Q", "A", "S", "Z", "A", "C", "F", "X"), "col3"=c(1,2,3,4,5,6,7,8))
df
col1 col2 col3
1 A Q 1
2 B A 2
3 C S 3
4 D Z 4
5 E A 5
6 F C 6
7 G F 7
8 A X 8
My character of interest is 'A'. I want to know how many rows exist where 'A' is either in col1 or col2. The answer here would be 4.
It easy enough when I'm interested in one value at a time, but how can I write a function to loop over all unique values and return the number of rows? My character of interest is always split with some entries in col1, others in col2.
I'm guessing that first I could stack col1 and col2 , then subset that based only on the unique entries. Then I'd want to say, "how many rows exist where col1 or col2 contain 'B', 'C', etc.
UPDATE :
I learned more about for loops and came to this (imperfect but working) solution:
library(reshape2)
#Make a dataframe with just two columns of characters
df <- data.frame("col1"= c("A", "B", "C", "D", "E", "F", "G", "A"), "col2"=c("Q", "A", "S", "Z", "A", "C", "F", "X"))
#Allocate empty dataframe for results
newdf = NULL
#For any unique characters between col1 and col2, count how many times they appear in df
for(i in unique(stack(df)$value)){
newdf <- rbind(newdf,data.frame("col1"=i, "col2"=nrow(df[df$col1==i | df$col2==i,])))
}
newdf
col1 col2
1 A 4
2 B 1
3 C 2
4 D 1
5 E 1
6 F 2
7 G 1
8 Q 1
9 S 1
10 Z 1
11 X 1