Hi I have an excel file like the following that actually maps the row names (KO) with the column names (Pathways). There are two values, 0 and 1, 0 denoting absence and 1 denoting presence. For example here (3R)-linalool biosynthesis contains 6 KO values K00001, K00002, K00008, K00012, K00016, and K00019, other KO values are not within the (3R)-linalool biosynthesis pathway:
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| KO | (3R)-linalool biosynthesis | (3S)-linalool biosynthesis | (4Z,7Z,10Z,13Z,16Z)-docosa-4,7,10,13,16-pentaenoate biosynthesis (6-desaturase) | (5R)-carbapenem carboxylate biosynthesis | (5Z)-dodec-5-enoate biosynthesis | (8E,10E)-dodeca-8,10-dienol biosynthesis | (Kdo)2-lipid A biosynthesis I | (R)- and (S)-3-hydroxybutanoate biosynthesis | (R)-acetoin biosynthesis I | (R)-acetoin biosynthesis II | (R)-cysteate degradation |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00001 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00002 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00003 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00004 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00007 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00008 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00009 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00010 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00011 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00012 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00013 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00014 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00016 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00018 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00019 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
The second file is actually a copy number file which shows KO (column names) copy numbers present in each bacteria (row names). For example, s__Absiella_dolichum has 2 copies of K00007:
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Taxa | K00001 | K00002 | K00003 | K00004 | K00005 | K00007 | K00008 | K00009 | K00010 | K00011 | K00012 | K00013 | K00014 | K00016 | K00018 | K00019 | K00020 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Collinsella_massiliensis | 1 | 0 | 1 | 0 | 2 | 0 | 2 | 1 | 1 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 1 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Abiotrophia_defectiva | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Abiotrophia_sp_HMSC24B09 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Absiella_dolichum | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acholeplasma_oculi | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 1 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidaminococcus_fermentans | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 3 | 1 | 0 | 0 | 0 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidaminococcus_intestini | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.5 | 0 | 4 | 2 | 0 | 0 | 0 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidaminococcus_sp_CAG_542 | 1 | 0 | 1 | 0 | 0.2 | 0 | 0 | 0 | 0 | 0 | 1.2 | 0 | 3.2 | 1.8 | 0.6 | 0 | 0 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidipropionibacterium_acidipropionici | 0.2 | 0 | 1 | 0 | 0 | 0 | 4 | 0 | 3.8 | 0 | 0.8 | 0 | 1.6 | 2.6 | 0 | 0 | 0 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acinetobacter_lwoffii | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 1 | 0 | 4 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acinetobacter_ursingii | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0.6 | 1 | 1 | 3 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Actinobaculum_sp_oral_taxon_183 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 2 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Actinomyces_cardiffensis | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 2 | 0 | 0 | 0 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Now I want to make a third file which will be bacteria vs Pathway copy number file like:
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| Taxa | (3R)-linalool biosynthesis | (3S)-linalool biosynthesis | (4Z,7Z,10Z,13Z,16Z)-docosa-4,7,10,13,16-pentaenoate biosynthesis (6-desaturase) | (5R)-carbapenem carboxylate biosynthesis | (5Z)-dodec-5-enoate biosynthesis | (8E,10E)-dodeca-8,10-dienol biosynthesis | (Kdo)2-lipid A biosynthesis I | (R)- and (S)-3-hydroxybutanoate biosynthesis | (R)-acetoin biosynthesis I | (R)-acetoin biosynthesis II | (R)-cysteate degradation |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Collinsella_massiliensis | | | | | | | | | | | |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Abiotrophia_defectiva | | | | | | | | | | | |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Abiotrophia_sp_HMSC24B09 | | | | | | | | | | | |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Absiella_dolichum | | | | | | | | | | | |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
so, the condition to be followed to make the file should be like: the copy number of a pathway will be the average copy number across all KOs in that pathway. For example, the copy number of (3R)-linalool biosynthesis in s__Acidaminococcus_sp_CAG_542 should be the average of six KOs 1,0,0,1.2,1.8, and 0, that is, 4/6= 0.67. This way the pathway copy number of all the bacteria will be calculated to find the final table. Can anyone please help me?
Many thanks