Count the number of times an observation appears in each row

Jack19 · October 11, 2020, 7:30pm

Hello, I have a data frame as following
Marker Allele_A Allele_B Line1 Line2 Line3 Line4
1 C G C C C G
2 A T A T T T
3 G C G G G C

I would like to count the number of Allele_A and Allele_B appears among Line1-4 in each row and get the following results:
Marker Allele_A Allele_B Line1 Line2 Line3 Line4 Allele_A# Allele_B#
1 C G C C C G 3 1
2 A T A T T T 1 3
3 G C G G G C 3 1

I tried different codes and all failed. Could anyone help me to solve this problem? Thanks a lot!

kuriwaki · October 11, 2020, 8:27pm

I would make heavy use of dataset reshaping -- your data is in "wide" format where one row is a marker, but it is easier to count if the dataset is reshaper to "long" format. After that, count, then reshape to wide. Here is a snippet using your data.

library(tidyverse)

# create dataset
df_wide <- tribble(
  ~Marker, ~Allele_A, ~Allele_B, ~Line1, ~Line2, ~Line3, ~Line4,
  1, "C", "G", "C", "C", "C", "G",
  2, "A", "T", "A", "T", "T", "T",
  3, "G", "C", "G", "G", "G", "C")

# reshape to long -- line values
line_long <- df_wide %>% 
  select(Marker, matches("Line")) %>% 
  pivot_longer(-Marker, names_to = "Line", values_to = "Allele")

# reshape to long -- search values
search_long <- df_wide %>% 
  select(Marker, matches("Allele")) %>% 
  pivot_longer(-Marker, names_to = "search", values_to = "search_for")

# merge both by Marker
both_long <- left_join(line_long, search_long, by = "Marker")

# count hits
hits_long <- both_long %>% 
  group_by(Marker, search) %>% 
  summarize(count = sum(Allele == search_for), 
            .groups = "drop")

# reshape hits to wide
hits_wide <- hits_long %>% 
  pivot_wider(id_cols = Marker,
              names_from = search, 
              values_from = count, 
              names_prefix = "count_")

# rejoin
left_join(df_wide, hits_wide)
#> Joining, by = "Marker"
#> # A tibble: 3 x 9
#>   Marker Allele_A Allele_B Line1 Line2 Line3 Line4 count_Allele_A count_Allele_B
#>    <dbl> <chr>    <chr>    <chr> <chr> <chr> <chr>          <int>          <int>
#> 1      1 C        G        C     C     C     G                  3              1
#> 2      2 A        T        A     T     T     T                  1              3
#> 3      3 G        C        G     G     G     C                  3              1

^{Created on 2020-10-11 by the reprex package (v0.3.0)}

Jack19 · October 12, 2020, 4:33am

Great! It worked! Thank you for the quick response!

system · October 23, 2020, 3:54am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.