create new values between 2 values (start and stop)

Hi everyone, I'm new in this forum. I'm a biology student from Italy.

I have to create a plot using a values obtained from an alignment from DNA sequences (from BLAST).

the results are in this format:

Where:
seq. id is the name of my sequence, coverage % is how much my DNA sequence covers on the reference, lenght is the lenght in nucleotides of my sequences paired on the reference, and start/stop are the numbers of the DNA bases of the reference where my sequence aligns.

My aim is to plot a thing like this:


To make this, I have to generate all the values from the start to the stop, and give these numbers the corrispective % of coverage (example with 90% of coverage the start is at 190 bp and the stop at 800 bp, so 190 90%,191 90%,192 90%,193 90%..... 798 90%, 799 %, 800 %, and write it in a new table).

How I can do it? I import my txt table on R using read.table, so I think that is a data.frame.

I'm going to use ggplot2 to plot it

thank you all in advance, greatings from Italy.

Is this what you want?

library(dplyr)
library(tidyr)

DF <- data.frame(Seq = c("A", "B", "C"), 
                 Coverage = c(80, 75, 90), 
                 Start = c(1000, 53, 467), 
                 Stop = c(1004, 56, 470))
DF
#>   Seq Coverage Start Stop
#> 1   A       80  1000 1004
#> 2   B       75    53   56
#> 3   C       90   467  470
DF2 <- DF %>% gather(key = "END", value = "Bases", Start, Stop)
DF2 <- DF2 %>% group_by(Seq) %>% mutate(AllBases = list(full_seq(Bases, 1)))
DF2
#> # A tibble: 6 x 5
#> # Groups:   Seq [3]
#>   Seq   Coverage END   Bases AllBases 
#>   <fct>    <dbl> <chr> <dbl> <list>   
#> 1 A           80 Start  1000 <dbl [5]>
#> 2 B           75 Start    53 <dbl [4]>
#> 3 C           90 Start   467 <dbl [4]>
#> 4 A           80 Stop   1004 <dbl [5]>
#> 5 B           75 Stop     56 <dbl [4]>
#> 6 C           90 Stop    470 <dbl [4]>
DF2 <- unnest(DF2, cols = c(Coverage, AllBases)) %>% filter(END == "Start")
DF2  
#> # A tibble: 13 x 5
#> # Groups:   Seq [3]
#>    Seq   Coverage END   Bases AllBases
#>    <fct>    <dbl> <chr> <dbl>    <dbl>
#>  1 A           80 Start  1000     1000
#>  2 A           80 Start  1000     1001
#>  3 A           80 Start  1000     1002
#>  4 A           80 Start  1000     1003
#>  5 A           80 Start  1000     1004
#>  6 B           75 Start    53       53
#>  7 B           75 Start    53       54
#>  8 B           75 Start    53       55
#>  9 B           75 Start    53       56
#> 10 C           90 Start   467      467
#> 11 C           90 Start   467      468
#> 12 C           90 Start   467      469
#> 13 C           90 Start   467      470

Created on 2019-11-20 by the reprex package (v0.3.0.9000)

2 Likes

thank you very much! yes something like this!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.