find common variants in multiple vcf files (need help please!)

I am looking for a tool that collects common variants from multiple VCF files (more than 30 files) the only tool I could find so far is findcommonvariants from Rsubread package, but it has two problems:

  1. it completely ignores the last input, ex. if I put (A.vcf, B.vcf , C.vcf) it compares only between A and B
ABC = findCommonVariants(c(A,B,C))
>> MA01vs03 = findCommonVariants(c(A,B,C))
Process file A
There are 106 variants found in this file.

Process file B
There are 91 variants found in this file.

Finished! There are 5 common variants from the 2 input files.
  1. even when it says 5 common variants, it only shows 3 variant (-2 of the actual written number)

Anybody knows how to fix that, or if there is a better tool to do this comparison?

This question might be a better fit for the bioconductor forum or biostars.

that said, here's an option using the VariantAnnotation and GenomicRanges packages


library(purrr)
library(VariantAnnotation)
library(GenomicRanges)
library(Rsubread)

vcf2 <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") 

comparison_files <- c(vcf2, vcf2, vcf2)


comparison_vcfs <- 
  comparison_files %>% 
  purrr::map(readVcfAsVRanges)

intersected_vranges <- purrr::reduce(comparison_vcfs, subsetByOverlaps)


1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.