i am trying to forge a BSgenome by following https://www.bioconductor.org/packages/devel/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf. However, i have trouble to prepare the seed file.
Just wondering whether you have any suggestions on preparing a seed file (DCF format - Debian Control File), which is also the format used for the DESCRIPTION file of any R package. The seed file contains all the information needed by the forgeBSgenomeDataPkg function to forge the target package. I watched a few tutorials about generate DESCRIPTION file of R package. But it seems that the DESCRITION files normally come with the process of writing a R package. So I am not clear about how to generate a DESCRIPTION file separately and then to save in a folder I need.
The manual of BSgenomeForge https://www.bioconductor.org/packages/devel/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf suggests to prepare a seed file first. But I could not find much information about how to generate a seed file. I could only see the sample seed files and I knew the information I want to put in the seed file (as attached), but I do not know how to generate a DCF file of my own. Just wondering whether you have any suggestions.
I tried to write the description file as txt, but I got the following error message. It seems that the command did not read my seed file at all because my provider should be UNSW. Do you have any thoughts for this?
Error in if (provider == "UCSC") { : argument is of length zero
In addition: Warning message:
In readLines(infile, n = 25000L) :
incomplete final line found on '/Users/jiazhou/Box/methylation_analysis/msgbsR/BSgenome.Rhinella.marina/caneToad_seed'
According to the source code of generating dcf file, I was trying to generate a dcf file "y", but could not import.
An online DCF file with multiple records
con <- url("https://cran.r-project.org/src/contrib/PACKAGES")
y <- read.dcf(con, all = TRUE)
close(con)
utils::str(y)
write.dcf(y, file = "y")
dcf <- read.dcf(y, all = TRUE)
Error in read.dcf(y, all = TRUE) :
'file' must be a character string or connection
Thanks very much for the quick reply. I am pretty new in this type of analysis. I did prepare my seed file according to the examples in the package. Not sure whether there is problem with the format of the file.
seed file start here
Package: BSgenome.Rhinella.marina.UNSW.RM170330 Title: Full genome sequences for Rhinella marina (UNSW version RM170330) Description: Full genome sequences for Rhinella marina (cane toad) as provided by UNSW (RM170330) organism: Rhinella marina common_name: Cane toad provider: UNSW provider_version: RM170330 release_date: Mar. 2018
release_name: Rhinella marina (marine toad)
source_url: https://www.ncbi.nlm.nih.gov/assembly/GCA_900303285.1/
organism_biocview: Rhinella marina
BSgenomeObjname: Rhinella marina
SrcDataFiles: .fna from https://www.ncbi.nlm.nih.gov/assembly/GCA_900303285.1/
seqs_srcdir: /Users/jiazhou/Box/methylation_analysis/CaneToadRef/ncbi-genomes-2020-03-16/
seqfile_name: GCA_900303285.1_RM170330_genomic.fna
Error in if (provider == "UCSC") { : argument is of length zero
In addition: Warning message:
In readLines(infile, n = 25000L) :
incomplete final line found on '/Users/jiazhou/Box/methylation_analysis/msgbsR/BSgenome.Rhinella.marina/caneToad_seed'
If we have to use DCF format, just wondering whether you have any suggestions on preparing it. According to the source code of generating dcf file, I used write.dcf function to generate a dcf file "y" from the sample file in the source. The generated file "y" looked similar to txt format, but I could not import this file in turn using read.dcf function. Just wondering whether you have any thoughts on this?
#An online DCF file with multiple records
con <- url("https://cran.r-project.org/src/contrib/PACKAGES")
y <- read.dcf(con, all = TRUE)
close(con)
utils::str(y)
write.dcf(y, file = "y")
dcf <- read.dcf(y, all = TRUE)
Error in read.dcf(y, all = TRUE) :
'file' must be a character string or connection
Thanks for the helps. But I still have error in the command of forgeBSgenomeDataPkg:
> forgeBSgenomeDataPkg('/Users/jiazhou/Box/methylation_analysis/msgbsR/BSgenome.Rhinella.marina/caneToad* seed')
Error in .readSeedFile(x, verbose = verbose) : seed file '/Users/jiazhou/Box/methylation *analysis/msgbsR/BSgenome.Rhinella.marina/caneToad* seed' must have exactly 1 record
I wrote the information for the seed file in txt format and used write.dcf function to generate dcf format.