I’m the same guy who posted about using R to detect breeding patterns in cutting horses.
I've been trying use R to help me find breeding patterns (matings) that may repeat in a sample of successful individuals (I suspect there are several of such patterns). The goal is to be able to say that 'this pattern is present in X number of individuals from a database of elite of highly success fun horses.
The data (ancestors) for each individual are laid out in rows as follows:
Column 1: Horse's name (current performer)
Column 2: Money earned
Column 3: Generation 1 Top (name of sire)
Column 4: Generation 1 Bottom (name of dam)
Column 5: Generation 2 Top (name of paternal grandsire)
Column 6: Generation 2 Top (name of paternal granddam)
Column 7: Generation 2 Bottom (name of maternal grandsire)
Column 8: Generation 2 Bottom (name of maternal granddam)
And so on until the 5th generation.
After giving it further thought I decided that the best way to achieve what I’m after is to get R to detect the duplicates in a column, filter the database for each of these duplicates and find the duplicates on the following column and so on. For each pattern that occurs more than once, R would then list the horses (Column 1) associated with each repeated pattern
During my research I haven’t been able to find examples of the programming I’m after. All the stuff on duplicates is about finding them to delete them. I need R to find them and use them to filter the database for every column, going from left to right.
Can anyone point me to examples that were seeking something similar?
Thanks in advance for any advice.