How to make a third output file from two files and a relation between the?

Hi I have an excel file like the following that actually maps the row names (KO) with the column names (Pathways). There are two values, 0 and 1, 0 denoting absence and 1 denoting presence. For example here (3R)-linalool biosynthesis contains 6 KO values K00001, K00002, K00008, K00012, K00016, and K00019, other KO values are not within the (3R)-linalool biosynthesis pathway:

+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| KO     | (3R)-linalool biosynthesis | (3S)-linalool biosynthesis | (4Z,7Z,10Z,13Z,16Z)-docosa-4,7,10,13,16-pentaenoate biosynthesis (6-desaturase) | (5R)-carbapenem carboxylate biosynthesis | (5Z)-dodec-5-enoate biosynthesis | (8E,10E)-dodeca-8,10-dienol biosynthesis | (Kdo)2-lipid A biosynthesis I | (R)- and (S)-3-hydroxybutanoate biosynthesis | (R)-acetoin biosynthesis I | (R)-acetoin biosynthesis II | (R)-cysteate degradation |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00001 | 1                          | 0                          | 1                                                                               | 1                                        | 1                                | 1                                        | 1                             | 1                                            | 0                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00002 | 1                          | 0                          | 1                                                                               | 0                                        | 0                                | 0                                        | 0                             | 0                                            | 0                          | 0                           | 1                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00003 | 0                          | 1                          | 0                                                                               | 0                                        | 0                                | 1                                        | 0                             | 0                                            | 0                          | 1                           | 1                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00004 | 0                          | 1                          | 0                                                                               | 0                                        | 0                                | 0                                        | 0                             | 1                                            | 1                          | 0                           | 1                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00005 | 0                          | 0                          | 0                                                                               | 0                                        | 0                                | 0                                        | 0                             | 0                                            | 0                          | 1                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00007 | 0                          | 0                          | 0                                                                               | 0                                        | 1                                | 1                                        | 0                             | 0                                            | 0                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00008 | 1                          | 0                          | 1                                                                               | 1                                        | 0                                | 0                                        | 0                             | 0                                            | 0                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00009 | 0                          | 1                          | 0                                                                               | 0                                        | 0                                | 0                                        | 0                             | 1                                            | 0                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00010 | 0                          | 1                          | 0                                                                               | 0                                        | 0                                | 1                                        | 1                             | 0                                            | 1                          | 0                           | 1                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00011 | 0                          | 1                          | 0                                                                               | 0                                        | 0                                | 0                                        | 0                             | 0                                            | 0                          | 1                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00012 | 1                          | 0                          | 1                                                                               | 1                                        | 0                                | 0                                        | 0                             | 0                                            | 0                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00013 | 0                          | 0                          | 0                                                                               | 0                                        | 1                                | 1                                        | 0                             | 1                                            | 1                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00014 | 0                          | 1                          | 0                                                                               | 0                                        | 0                                | 0                                        | 0                             | 0                                            | 0                          | 1                           | 1                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00016 | 1                          | 0                          | 1                                                                               | 1                                        | 0                                | 0                                        | 0                             | 0                                            | 1                          | 1                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00018 | 0                          | 0                          | 0                                                                               | 0                                        | 0                                | 0                                        | 0                             | 0                                            | 0                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| K00019 | 1                          | 1                          | 1                                                                               | 0                                        | 0                                | 1                                        | 1                             | 0                                            | 0                          | 0                           | 0                        |
+--------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+

The second file is actually a copy number file which shows KO (column names) copy numbers present in each bacteria (row names). For example, s__Absiella_dolichum has 2 copies of K00007:

+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Taxa                                      | K00001 | K00002 | K00003 | K00004 | K00005 | K00007 | K00008 | K00009 | K00010 | K00011 | K00012 | K00013 | K00014 | K00016 | K00018 | K00019 | K00020 |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Collinsella_massiliensis               | 1      | 0      | 1      | 0      | 2      | 0      | 2      | 1      | 1      | 0      | 1      | 0      | 2      | 1      | 0      | 0      | 1      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Abiotrophia_defectiva                  | 0      | 0      | 1      | 0      | 1      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 1      | 0      | 0      | 1      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Abiotrophia_sp_HMSC24B09               | 0      | 0      | 1      | 0      | 1      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 1      | 0      | 0      | 1      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Absiella_dolichum                      | 0      | 0      | 1      | 0      | 0      | 0      | 0      | 2      | 0      | 0      | 1      | 0      | 1      | 1      | 0      | 0      | 0      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acholeplasma_oculi                     | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 2      | 1      | 0      | 0      | 1      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidaminococcus_fermentans             | 2      | 0      | 1      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 2      | 0      | 3      | 1      | 0      | 0      | 0      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidaminococcus_intestini              | 0      | 0      | 1      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0.5    | 0      | 4      | 2      | 0      | 0      | 0      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidaminococcus_sp_CAG_542             | 1      | 0      | 1      | 0      | 0.2    | 0      | 0      | 0      | 0      | 0      | 1.2    | 0      | 3.2    | 1.8    | 0.6    | 0      | 0      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acidipropionibacterium_acidipropionici | 0.2    | 0      | 1      | 0      | 0      | 0      | 4      | 0      | 3.8    | 0      | 0.8    | 0      | 1.6    | 2.6    | 0      | 0      | 0      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acinetobacter_lwoffii                  | 3      | 0      | 1      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 1      | 0      | 2      | 0      | 1      | 0      | 4      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Acinetobacter_ursingii                 | 3      | 0      | 1      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 1      | 0      | 2      | 0.6    | 1      | 1      | 3      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Actinobaculum_sp_oral_taxon_183        | 0      | 0      | 1      | 0      | 0      | 0      | 1      | 0      | 1      | 0      | 1      | 0      | 1      | 1      | 0      | 0      | 2      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| s__Actinomyces_cardiffensis               | 0      | 0      | 2      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 0      | 4      | 2      | 0      | 0      | 0      |
+-------------------------------------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+

Now I want to make a third file which will be bacteria vs Pathway copy number file like:

+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| Taxa                        | (3R)-linalool biosynthesis | (3S)-linalool biosynthesis | (4Z,7Z,10Z,13Z,16Z)-docosa-4,7,10,13,16-pentaenoate biosynthesis (6-desaturase) | (5R)-carbapenem carboxylate biosynthesis | (5Z)-dodec-5-enoate biosynthesis | (8E,10E)-dodeca-8,10-dienol biosynthesis | (Kdo)2-lipid A biosynthesis I | (R)- and (S)-3-hydroxybutanoate biosynthesis | (R)-acetoin biosynthesis I | (R)-acetoin biosynthesis II | (R)-cysteate degradation |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Collinsella_massiliensis |                            |                            |                                                                                 |                                          |                                  |                                          |                               |                                              |                            |                             |                          |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Abiotrophia_defectiva    |                            |                            |                                                                                 |                                          |                                  |                                          |                               |                                              |                            |                             |                          |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Abiotrophia_sp_HMSC24B09 |                            |                            |                                                                                 |                                          |                                  |                                          |                               |                                              |                            |                             |                          |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+
| s__Absiella_dolichum        |                            |                            |                                                                                 |                                          |                                  |                                          |                               |                                              |                            |                             |                          |
+-----------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------+------------------------------------------+----------------------------------+------------------------------------------+-------------------------------+----------------------------------------------+----------------------------+-----------------------------+--------------------------+

so, the condition to be followed to make the file should be like: the copy number of a pathway will be the average copy number across all KOs in that pathway. For example, the copy number of (3R)-linalool biosynthesis in s__Acidaminococcus_sp_CAG_542 should be the average of six KOs 1,0,0,1.2,1.8, and 0, that is, 4/6= 0.67. This way the pathway copy number of all the bacteria will be calculated to find the final table. Can anyone please help me?

Many thanks

Hi @CHAND,
Assuming you can read-in your two Excel files, code like this should do the job.
I shortened the column names for the first dataset. Those left with a leading number get "X" added at read-in.

suppressPackageStartupMessages(library(tidyverse))

in1 <- read.table(header=TRUE, sep="", text=c("
KO      3R_linalool 3S_linalool docosa  5R_carbapenem  5Z_dodec  dodeca_dienol Kdo_lipid_A_I  R_S_hydroxybutanoate  R_acetoin_I  R_acetoin_II  R_cysteate_deg
K00001  1           0           1       1               1         1             1              1                     0              0            0                       
K00002  1           0           1       0               0         0             0              0                     0              0            1                       
K00003  0           1           0       0               0         1             0              0                     0              1            1                       
K00004  0           1           0       0               0         0             0              1                     1              0            1                       
K00005  0           0           0       0               0         0             0              0                     0              1            0                       
K00007  0           0           0       0               1         1             0              0                     0              0            0                       
K00008  1           0           1       1               0         0             0              0                     0              0            0                       
K00009  0           1           0       0               0         0             0              1                     0              0            0                       
K00010  0           1           0       0               0         1             1              0                     1              0            1                       
K00011  0           1           0       0               0         0             0              0                     0              1            0                       
K00012  1           0           1       1               0         0             0              0                     0              0            0                       
K00013  0           0           0       0               1         1             0              1                     1              0            0                       
K00014  0           1           0       0               0         0             0              0                     0              1            1                       
K00016  1           0           1       1               0         0             0              0                     1              1            0
K00018  0           0           0       0               0         0             0              0                     0              0            0                       
K00019  1           1           1       0               0         1             1              0                     0              0            0                       
"))

## Note: K00020 is in this second dataset but not the first?

in2 <- read.table(header=TRUE, sep="", text=c("
Taxa                                      K00001  K00002  K00003  K00004  K00005 K00007  K00008  K00009  K00010  K00011  K00012  K00013  K00014  K00016  K00018  K00019  K00020 
s__Collinsella_massiliensis               1       0       1       0       2      0       2       1       1       0       1       0       2       1       0       0       1      
s__Abiotrophia_defectiva                  0       0       1       0       1      0       0       0       0       0       0       0       0       1       0       0       1      
s__Abiotrophia_sp_HMSC24B09               0       0       1       0       1      0       0       0       0       0       0       0       0       1       0       0       1      
s__Absiella_dolichum                      0       0       1       0       0      0       0       2       0       0       1       0       1       1       0       0       0      
s__Acholeplasma_oculi                     0       0       0       0       0      0       0       0       0       0       0       0       2       1       0       0       1      
s__Acidaminococcus_fermentans             2       0       1       0       0      0       0       0       0       0       2       0       3       1       0       0       0      
s__Acidaminococcus_intestini              0       0       1       0       0      0       0       0       0       0       0.5     0       4       2       0       0       0      
s__Acidaminococcus_sp_CAG_542             1       0       1       0       0.2    0       0       0       0       0       1.2     0       3.2     1.8     0.6     0       0      
s__Acidipropionibacterium_acidipropionici 0.2     0       1       0       0      0       4       0       3.8     0       0.8     0       1.6     2.6     0       0       0      
s__Acinetobacter_lwoffii                  3       0       1       0       0      0       0       0       0       0       1       0       2       0       1       0       4      
s__Acinetobacter_ursingii                 3       0       1       0       0      0       0       0       0       0       1       0       2       0.6     1       1       3      
s__Actinobaculum_sp_oral_taxon_183        0       0       1       0       0      0       1       0       1       0       1       0       1       1       0       0       2      
s__Actinomyces_cardiffensis               0       0       2       0       0      0       0       0       0       0       0       0       4       2       0       0       0    
"))

in1 %>% 
  pivot_longer(cols=2:12, names_to="pathway", values_to="present") -> in1_long
head(in1_long, n=20)
#> # A tibble: 20 x 3
#>    KO     pathway              present
#>    <chr>  <chr>                  <int>
#>  1 K00001 X3R_linalool               1
#>  2 K00001 X3S_linalool               0
#>  3 K00001 docosa                     1
#>  4 K00001 X5R_carbapenem             1
#>  5 K00001 X5Z_dodec                  1
#>  6 K00001 dodeca_dienol              1
#>  7 K00001 Kdo_lipid_A_I              1
#>  8 K00001 R_S_hydroxybutanoate       1
#>  9 K00001 R_acetoin_I                0
#> 10 K00001 R_acetoin_II               0
#> 11 K00001 R_cysteate_deg             0
#> 12 K00002 X3R_linalool               1
#> 13 K00002 X3S_linalool               0
#> 14 K00002 docosa                     1
#> 15 K00002 X5R_carbapenem             0
#> 16 K00002 X5Z_dodec                  0
#> 17 K00002 dodeca_dienol              0
#> 18 K00002 Kdo_lipid_A_I              0
#> 19 K00002 R_S_hydroxybutanoate       0
#> 20 K00002 R_acetoin_I                0

in2 %>% 
  pivot_longer(cols=2:18, names_to="KO", values_to="val2") -> in2_long
head(in2_long, n=20)
#> # A tibble: 20 x 3
#>    Taxa                        KO      val2
#>    <chr>                       <chr>  <dbl>
#>  1 s__Collinsella_massiliensis K00001     1
#>  2 s__Collinsella_massiliensis K00002     0
#>  3 s__Collinsella_massiliensis K00003     1
#>  4 s__Collinsella_massiliensis K00004     0
#>  5 s__Collinsella_massiliensis K00005     2
#>  6 s__Collinsella_massiliensis K00007     0
#>  7 s__Collinsella_massiliensis K00008     2
#>  8 s__Collinsella_massiliensis K00009     1
#>  9 s__Collinsella_massiliensis K00010     1
#> 10 s__Collinsella_massiliensis K00011     0
#> 11 s__Collinsella_massiliensis K00012     1
#> 12 s__Collinsella_massiliensis K00013     0
#> 13 s__Collinsella_massiliensis K00014     2
#> 14 s__Collinsella_massiliensis K00016     1
#> 15 s__Collinsella_massiliensis K00018     0
#> 16 s__Collinsella_massiliensis K00019     0
#> 17 s__Collinsella_massiliensis K00020     1
#> 18 s__Abiotrophia_defectiva    K00001     0
#> 19 s__Abiotrophia_defectiva    K00002     0
#> 20 s__Abiotrophia_defectiva    K00003     1

full_join(in2_long, in1_long, by="KO") %>% 
  arrange(Taxa, pathway, KO) %>% 
  as.data.frame(.) -> full.df

head(full.df, n=20)
#>                        Taxa     KO val2       pathway present
#> 1  s__Abiotrophia_defectiva K00001    0        docosa       1
#> 2  s__Abiotrophia_defectiva K00002    0        docosa       1
#> 3  s__Abiotrophia_defectiva K00003    1        docosa       0
#> 4  s__Abiotrophia_defectiva K00004    0        docosa       0
#> 5  s__Abiotrophia_defectiva K00005    1        docosa       0
#> 6  s__Abiotrophia_defectiva K00007    0        docosa       0
#> 7  s__Abiotrophia_defectiva K00008    0        docosa       1
#> 8  s__Abiotrophia_defectiva K00009    0        docosa       0
#> 9  s__Abiotrophia_defectiva K00010    0        docosa       0
#> 10 s__Abiotrophia_defectiva K00011    0        docosa       0
#> 11 s__Abiotrophia_defectiva K00012    0        docosa       1
#> 12 s__Abiotrophia_defectiva K00013    0        docosa       0
#> 13 s__Abiotrophia_defectiva K00014    0        docosa       0
#> 14 s__Abiotrophia_defectiva K00016    1        docosa       1
#> 15 s__Abiotrophia_defectiva K00018    0        docosa       0
#> 16 s__Abiotrophia_defectiva K00019    0        docosa       1
#> 17 s__Abiotrophia_defectiva K00001    0 dodeca_dienol       1
#> 18 s__Abiotrophia_defectiva K00002    0 dodeca_dienol       0
#> 19 s__Abiotrophia_defectiva K00003    1 dodeca_dienol       1
#> 20 s__Abiotrophia_defectiva K00004    0 dodeca_dienol       0

# Make a combined dataframe, filter on present, calculate group means
full.df %>% 
  filter(present > 0) %>% 
  group_by(Taxa, pathway) %>% 
  summarise(mean_val = round(mean(val2), digits=3)) %>% 
  as.data.frame(.) -> out.df
#> `summarise()` has grouped output by 'Taxa'. You can override using the `.groups` argument.

head(out.df)
#>                       Taxa        pathway mean_val
#> 1 s__Abiotrophia_defectiva         docosa    0.167
#> 2 s__Abiotrophia_defectiva  dodeca_dienol    0.167
#> 3 s__Abiotrophia_defectiva  Kdo_lipid_A_I    0.000
#> 4 s__Abiotrophia_defectiva    R_acetoin_I    0.250
#> 5 s__Abiotrophia_defectiva   R_acetoin_II    0.600
#> 6 s__Abiotrophia_defectiva R_cysteate_deg    0.200

# Make 2-way output dataframe
out.df %>%
  filter(!is.na(pathway)) %>% 
  pivot_wider(id_cols=Taxa, names_from=pathway, values_from=mean_val) %>% 
  as.data.frame(.)
#>                                         Taxa docosa dodeca_dienol Kdo_lipid_A_I
#> 1                   s__Abiotrophia_defectiva  0.167         0.167         0.000
#> 2                s__Abiotrophia_sp_HMSC24B09  0.167         0.167         0.000
#> 3                       s__Absiella_dolichum  0.333         0.167         0.000
#> 4                      s__Acholeplasma_oculi  0.167         0.000         0.000
#> 5              s__Acidaminococcus_fermentans  0.833         0.500         0.667
#> 6               s__Acidaminococcus_intestini  0.417         0.167         0.000
#> 7              s__Acidaminococcus_sp_CAG_542  0.667         0.333         0.333
#> 8  s__Acidipropionibacterium_acidipropionici  1.267         0.833         1.333
#> 9                   s__Acinetobacter_lwoffii  0.667         0.667         1.000
#> 10                 s__Acinetobacter_ursingii  0.933         0.833         1.333
#> 11        s__Actinobaculum_sp_oral_taxon_183  0.500         0.333         0.333
#> 12               s__Actinomyces_cardiffensis  0.333         0.333         0.000
#> 13               s__Collinsella_massiliensis  0.833         0.500         0.667
#>    R_acetoin_I R_acetoin_II R_cysteate_deg R_S_hydroxybutanoate X3R_linalool
#> 1         0.25         0.60           0.20                 0.00        0.167
#> 2         0.25         0.60           0.20                 0.00        0.167
#> 3         0.25         0.60           0.40                 0.50        0.333
#> 4         0.25         0.60           0.40                 0.00        0.167
#> 5         0.25         1.00           0.80                 0.50        0.833
#> 6         0.50         1.40           1.00                 0.00        0.417
#> 7         0.45         1.24           0.84                 0.25        0.667
#> 8         1.60         1.04           1.28                 0.05        1.267
#> 9         0.00         0.60           0.60                 0.75        0.667
#> 10        0.15         0.72           0.60                 0.75        0.933
#> 11        0.50         0.60           0.60                 0.00        0.500
#> 12        0.50         1.60           1.20                 0.00        0.333
#> 13        0.50         1.20           0.80                 0.50        0.833
#>    X3S_linalool X5R_carbapenem X5Z_dodec
#> 1         0.143          0.250     0.000
#> 2         0.143          0.250     0.000
#> 3         0.571          0.500     0.000
#> 4         0.286          0.250     0.000
#> 5         0.571          1.250     0.667
#> 6         0.714          0.625     0.000
#> 7         0.600          1.000     0.333
#> 8         0.914          1.900     0.067
#> 9         0.429          1.000     1.000
#> 10        0.571          1.150     1.000
#> 11        0.429          0.750     0.000
#> 12        0.857          0.500     0.000
#> 13        0.714          1.250     0.333

Created on 2021-08-25 by the reprex package (v2.0.1)

Before your next post please check the posting guide which tells you about how to make a "reproducible example". This enables potential helpers can work on the problem with minimal effort.

1 Like

Thanks for helpiong me out. But I am getting the following ERROR message for this step:


full_join(in2_long, in1_long, by="KO") %>% 
  arrange(Taxa, pathway, KO) %>% 
  as.data.frame(.) -> full.df

ERROR MESSAGE:


Error in vec_slice(x_out, x_slicer) : 
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:537

So, I am not getting any results. What to do?

Hi @CHAND,
Not sure what that error message means. Try upgrading R to the latest version (4.1.1), and then updating all the installed packages (especially {tidyverse}). I am using 64-bit R and RStudio under Windows 10.

What I have researched that it means system is running out of memory while doing such a large task (as the tables are really very big).

Remove the intermediate objects (in1, in2, in1_long, in2_long) from memory as soon as they are no longer needed e.g rm(in1) and then do gc() to immediately free-up memory (although this should happen automatically).
Hope this helps.

1 Like

HI @DavoWW - Tried... But getting the same error!!! I am thinking of it in a bit different way with loop. But due to lack of knowledge and skill can not. So, I have posted the question in a different way here. Will you please take a look?

Thanks

Thanks, @DavoWW - I have modified your code in the following way and it is running perfectly with my data. Will please check the script from your end and tell me if there is any mistake that may give a wrong output or everything is fine?

library(readxl)
library(dplyr)
library(tidyverse)
in_path <- read_excel(path = "c:/Users/deepc/Downloads/KO_pathway_map.xlsx", col_names = TRUE)

in_sp <- read_excel(path = "C:/Users/deepc/Downloads/KO_copy.xlsx", col_names = TRUE)


in_path_long <- in_path %>% 
  pivot_longer(cols=2:1598, names_to="pathway", values_to="present") %>% subset(present==1) %>% select(-"present")


in_sp_long <- in_sp %>% 
  pivot_longer(cols = 2:1933, names_to = "KO", values_to = "copy")



path_sp_merge <- full_join(in_path_long, in_sp_long, by="KO") %>% arrange(Taxa, pathway, KO) %>% group_by(Taxa, pathway) %>% 
  summarise(mean_val = round(mean(copy), digits=3)) %>% drop_na


sp_path_map <- path_sp_merge %>% filter(!is.na(pathway)) %>% 
  pivot_wider(id_cols=Taxa, names_from=pathway, values_from=mean_val)

Many thanks for helping me out.

1 Like

Looks like you've nailed it!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.