Query regarding data modification

Hi! I intend to selectively find out information from two different tables.

table1
structure(list(seqnames = structure(c(2L, 2L, 3L, 1L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 8L, 9L), .Label = c("ch2",
"chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8"
), class = "factor"), start = c(225L, 350L, 447L, 499L, 610L,
666L, 1412L, 1506L, 1671L, 1794L, 1850L, 2001L, 2190L, 2354L,
2417L, 2477L, 2557L), end = c(371L, 496L, 593L, 645L, 756L, 812L,
1558L, 1652L, 1817L, 1940L, 1996L, 2147L, 2336L, 2500L, 2563L,
2623L, 2703L), width = c(147L, 147L, 147L, 147L, 147L, 147L,
147L, 147L, 147L, 147L, 147L, 147L, 147L, 147L, 147L, 147L, 147L
), strand = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "*", class = "factor"),
score = c(0.744584618, 0.69715161, 0.65083881, 0.591950661,
0.68148466, 0.700261139, 0.731494662, 0.870811459, 0.717907731,
0.689581274, 0.729202366, 0.748913175, 0.855107416, 0.68530477,
0.501268947, 0.700319308, 0.69678428), score_w = c(0.532527595,
0.450347631, 0.38051378, 0.345649841, 0.385857893, 0.429376302,
0.480464849, 0.741622968, 0.46279857, 0.379162549, 0.458404731,
0.49907902, 0.710214833, 0.405306726, 0.223775737, 0.400686857,
0.393576014), score_h = c(0.956641641, 0.943955588, 0.921163841,
0.838251481, 0.977111426, 0.971145976, 0.982524474, 0.99999995,
0.973016893, 1, 1, 0.998747329, 1, 0.965302814, 0.778762157,
0.999951759, 0.999992546)), class = "data.frame", row.names = c(NA,
-17L))

table2
structure(list(seqname = structure(c(1L, 1L, 1L, 2L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("chr1",
"chr2", "chr3", "chr4", "chr5", "chr7", "chr8"), class = "factor"),
start = c(200L, 300L, 1000L, 500L, 300L, 700L, 1000L, 2000L,
2500L, 3000L, 1750L, 3000L, 1000L, 2200L, 1500L, 1750L, 2000L,
4000L), end = c(201L, 301L, 1001L, 501L, 301L, 701L, 1001L,
2001L, 2501L, 3001L, 1751L, 3001L, 1001L, 2201L, 1501L, 1751L,
2001L, 4001L)), class = "data.frame", row.names = c(NA, -18L
))

Table 1 contains the necessary score values according to the coordinates mentioned along with the chromosomes. This is a peak file and the coordinates depict the range over which the individual peaks span.

Table 2 contains another coordinate file for which I intend to find out the number of peaks (and coordinates) that span the 1000 base pairs (bp) upstream and downstream, respectively, from Table 1.
The final result should be in the form of a matrix/table.

Finally the presence of a peak or no peak should be converted to a binary file spanning the whole 2000 bp region, in order to draw a final heatmap/histogram. The presence of a peak would be assigned 1 whereas the absence can be assigned a value of 0.

I am not very sure how to start doing this kind of analysis currently. Please let me know if someone has an expertise over it.

Thanks in advance for your patience for reading the whole question!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.