Dear community,
At the moment I need a method to analyze data that I have and it would be of great help if you could collaborate with me. The data is shown as in the following example:
> glimpse(test)
Rows: 196
Columns: 8
$ Start.H <int> 1, 41, 81, 121, 121, 161, 401, 441, 721, 921, 1081, 1201, 1201, 1241, 1521, 1681, 1721, 2041, 2481, 2561, 2681…
$ End.H <int> 160, 160, 200, 240, 240, 280, 520, 680, 1040, 1120, 1280, 1320, 1320, 1360, 1800, 1800, 1840, 2440, 2680, 2680…
$ Start.I <int> 1, 41, 81, 121, 481, 681, 681, 841, 881, 1041, 1121, 1201, 1521, 1521, 1561, 1641, 1681, 1721, 1921, 2441, 248…
$ End.I <int> 120, 160, 200, 240, 600, 800, 800, 1000, 1040, 1200, 1240, 1360, 1640, 1640, 1680, 1760, 1840, 1840, 2400, 260…
$ Start.B <int> 1, 41, 41, 81, 121, 121, 161, 401, 721, 921, 1241, 1521, 2041, 2681, 2761, 2801, 2801, 2881, 2921, 2961, 2961,…
$ End.B <int> 120, 160, 160, 200, 240, 240, 280, 520, 1040, 1120, 1360, 1720, 2640, 2880, 2880, 2920, 2920, 3040, 3040, 3080…
$ Start.C <int> 1, 41, 81, 121, 121, 161, 401, 721, 921, 1121, 1201, 1241, 1521, 1721, 2041, 2481, 2681, 2761, 2881, 3681, 400…
$ End.C <int> 120, 160, 200, 240, 240, 280, 520, 1040, 1120, 1320, 1320, 1360, 1720, 1840, 2440, 2640, 2880, 2880, 3080, 396…
The Start and End are positions in the genome of the same organism that underwent different replicates (H, I, B and C). This is the suffix for each Start and End, as you can see in the dataframe.
My purpose is that if all the positions of both Start and End with all the replicas (H, I, B and C) are the same, the word "Conserved" is added in a new column called "Type". Now, the different combinations would be called "Shared" and if the position of Start and End is different from all the others it would be called "Unique".
For this, I wrote the following script:
test <- data %>%
mutate(Type = if_else(Start.H & End.H == Start.I & End.I == Start.B & End.B == Start.C & End.C) ~ "Conserved"
Start.H & End.H == Start.I & End.I != Start.B & End.B != Start.C & End.C) ~ "Shared"
Start.H & End.H == Start.I & End.I == Start.B & End.B != Start.C & End.C) ~ "Shared"
Start.H & End.H == Start.I & End.I == Start.B & End.B != Start.C & End.C) ~ "Shared"
Start.H & End.H != Start.I & End.I == Start.B & End.B != Start.C & End.C) ~ "Shared"
Start.H & End.H != Start.I & End.I != Start.B & End.B == Start.C & End.C) ~ "Shared"
Start.H & End.H != Start.I & End.I == Start.B & End.B == Start.C & End.C) ~ "Shared"
Start.H & End.H != Start.I & End.I != Start.B & End.B != Start.C & End.C) ~ "Unique"
Unfortunately, it doesn't work for the goal I need. For this reason, if you have a more feasible way to perform this operation, it would be greatly appreciated.
Thanks.