How to make a 1,0 presence absence file from a detailed file?

Hi, I have an excel table like the following:

KO	Pathway
K02372	(5Z)-dodec-5-enoate biosynthesis
K00059	(5Z)-dodec-5-enoate biosynthesis
K07535	(5Z)-dodec-5-enoate biosynthesis
K02371	(5Z)-dodec-5-enoate biosynthesis
K00004	(R)-acetoin biosynthesis I
K16845	(R)-cysteate degradation
K16846	(R)-cysteate degradation
K00004	(R,R)-butanediol biosynthesis
K00004	(R,R)-butanediol degradation
K00882	1,3-propanediol biosynthesis (engineered)
K11645	1,3-propanediol biosynthesis (engineered)
K19222	1,4-dihydroxy-2-naphthoate biosynthesis
K01661	1,4-dihydroxy-2-naphthoate biosynthesis
K19181	1,5-anhydrofructose degradation
K00219	10-cis-heptadecenoyl-CoA degradation (yeast)
K00219	10-trans-heptadecenoyl-CoA degradation (reductase-dependent, yeast)
K01834	1-butanol autotrophic biosynthesis
K15633	1-butanol autotrophic biosynthesis
K15635	1-butanol autotrophic biosynthesis
K01858	1D-myo-inositol hexakisphosphate biosynthesis III (Spirodela polyrrhiza)
K01858	1D-myo-inositol hexakisphosphate biosynthesis IV (Dictyostelium)
K00216	2,3-dihydroxybenzoate biosynthesis
K01252	2,3-dihydroxybenzoate biosynthesis
K10621	2,3-dihydroxybenzoate degradation
K01617	2,3-dihydroxybenzoate degradation
K02554	2,3-dihydroxybenzoate degradation
K10676	2,4-dichlorophenoxyacetate degradation
K01703	2,5-xylenol and 3,5-xylenol degradation
K01704	2,5-xylenol and 3,5-xylenol degradation
K15067	2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate
K10217	2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate
K01617	2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate
K02554	2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate
K10217	2-amino-3-carboxymuconate semialdehyde degradation to glutaryl-CoA

Each element in the first column is a KO name that is present within a pathway (the second column). From this, I want to make a presence (1)-absence(0) table which will represent if a KO name is present within a pathway or not. The final table should look like this (example for only first column is shown):

KO	(5Z)-dodec-5-enoate biosynthesis	(R)-acetoin biosynthesis I	(R)-cysteate degradation	(R,R)-butanediol biosynthesis	(R,R)-butanediol degradation	1,3-propanediol biosynthesis (engineered)	1,4-dihydroxy-2-naphthoate biosynthesis	1,5-anhydrofructose degradation	10-cis-heptadecenoyl-CoA degradation (yeast)	10-trans-heptadecenoyl-CoA degradation (reductase-dependent, yeast)	1-butanol autotrophic biosynthesis	1D-myo-inositol hexakisphosphate biosynthesis III (Spirodela polyrrhiza)	1D-myo-inositol hexakisphosphate biosynthesis IV (Dictyostelium)	2,3-dihydroxybenzoate biosynthesis	2,3-dihydroxybenzoate degradation	2,4-dichlorophenoxyacetate degradation	2,5-xylenol and 3,5-xylenol degradation	2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate	2-amino-3-carboxymuconate semialdehyde degradation to glutaryl-CoA
K16845	1																		
K15635	0																		
K15633	0																		
K15067	0																		
K11645	0																		
K10676	0																		
K16846	1																		
K10621	0																		
K10217	0																		
K19181	1																		
K07535	0																		
K02554	0																		
K02372	0																		
K02371	0																		
K19222	1																		
K01858	0																		
K01834	0																		
K01704	0																		
K01703	0																		
K01661	0																		
K01617	0																		
K01252	0																		
K00882	0																		
K00219	0																		
K00216	0																		
K00059	0																		
K00004	0

Can anyone please help me out?

Many thanks

Hi,

This can be done with the pivot_wider function from tidyr

library(tidyverse)

myData = data.frame(
  stringsAsFactors = FALSE,
                KO = c("K02372","K00059","K07535",
                       "K02371","K00004","K16845","K16846","K00004","K00004",
                       "K00882","K11645","K19222","K01661","K19181",
                       "K00219","K00219","K01834","K15633","K15635","K01858",
                       "K01858","K00216","K01252","K10621","K01617","K02554",
                       "K10676","K01703","K01704","K15067","K10217",
                       "K01617","K02554","K10217"),
           Pathway = c("(5Z)-dodec-5-enoate biosynthesis","(5Z)-dodec-5-enoate biosynthesis",
                       "(5Z)-dodec-5-enoate biosynthesis","(5Z)-dodec-5-enoate biosynthesis",
                       "(R)-acetoin biosynthesis I","(R)-cysteate degradation",
                       "(R)-cysteate degradation",
                       "(R,R)-butanediol biosynthesis","(R,R)-butanediol degradation",
                       "1,3-propanediol biosynthesis (engineered)",
                       "1,3-propanediol biosynthesis (engineered)","1,4-dihydroxy-2-naphthoate biosynthesis",
                       "1,4-dihydroxy-2-naphthoate biosynthesis",
                       "1,5-anhydrofructose degradation",
                       "10-cis-heptadecenoyl-CoA degradation (yeast)",
                       "10-trans-heptadecenoyl-CoA degradation (reductase-dependent, yeast)",
                       "1-butanol autotrophic biosynthesis","1-butanol autotrophic biosynthesis",
                       "1-butanol autotrophic biosynthesis",
                       "1D-myo-inositol hexakisphosphate biosynthesis III (Spirodela polyrrhiza)",
                       "1D-myo-inositol hexakisphosphate biosynthesis IV (Dictyostelium)",
                       "2,3-dihydroxybenzoate biosynthesis",
                       "2,3-dihydroxybenzoate biosynthesis","2,3-dihydroxybenzoate degradation",
                       "2,3-dihydroxybenzoate degradation",
                       "2,3-dihydroxybenzoate degradation","2,4-dichlorophenoxyacetate degradation",
                       "2,5-xylenol and 3,5-xylenol degradation",
                       "2,5-xylenol and 3,5-xylenol degradation",
                       "2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate",
                       "2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate",
                       "2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate",
                       "2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate",
                       "2-amino-3-carboxymuconate semialdehyde degradation to glutaryl-CoA")
)


myData = myData %>%
  mutate(present = 1) %>% 
  pivot_wider(KO, names_from = Pathway, values_from = present, values_fill = 0)

myData
#> # A tibble: 27 x 20
#>    KO     `(5Z)-dodec-5-eno~ `(R)-acetoin bi~ `(R)-cysteate d~ `(R,R)-butanedio~
#>    <chr>               <dbl>            <dbl>            <dbl>             <dbl>
#>  1 K02372                  1                0                0                 0
#>  2 K00059                  1                0                0                 0
#>  3 K07535                  1                0                0                 0
#>  4 K02371                  1                0                0                 0
#>  5 K00004                  0                1                0                 1
#>  6 K16845                  0                0                1                 0
#>  7 K16846                  0                0                1                 0
#>  8 K00882                  0                0                0                 0
#>  9 K11645                  0                0                0                 0
#> 10 K19222                  0                0                0                 0
#> # ... with 17 more rows, and 15 more variables:
#> #   (R,R)-butanediol degradation <dbl>,
#> #   1,3-propanediol biosynthesis (engineered) <dbl>,
#> #   1,4-dihydroxy-2-naphthoate biosynthesis <dbl>,
#> #   1,5-anhydrofructose degradation <dbl>,
#> #   10-cis-heptadecenoyl-CoA degradation (yeast) <dbl>,
#> #   10-trans-heptadecenoyl-CoA degradation (reductase-dependent, yeast) <dbl>,
#> #   1-butanol autotrophic biosynthesis <dbl>,
#> #   1D-myo-inositol hexakisphosphate biosynthesis III (Spirodela polyrrhiza) <dbl>,
#> #   1D-myo-inositol hexakisphosphate biosynthesis IV (Dictyostelium) <dbl>,
#> #   2,3-dihydroxybenzoate biosynthesis <dbl>,
#> #   2,3-dihydroxybenzoate degradation <dbl>,
#> #   2,4-dichlorophenoxyacetate degradation <dbl>,
#> #   2,5-xylenol and 3,5-xylenol degradation <dbl>,
#> #   2-amino-3-carboxymuconate semialdehyde degradation to 2-oxopentenoate <dbl>,
#> #   2-amino-3-carboxymuconate semialdehyde degradation to glutaryl-CoA <dbl>

Created on 2021-08-27 by the reprex package (v2.0.1)

Hope this helps,
PJ

1 Like

Thanks a lot, @pieterjanvc for helping me out. It works perfectly.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.