[BioC] NCBI gff3 annotation file and read.gff()
Marc Carlson
mcarlson at fhcrc.org
Wed Jul 16 19:58:29 CEST 2014
Yes you definitely can use makeTranscriptDbFromGFF if you want a
TranscriptDb object. The following works for example:
library("GenomicFeatures")
txdb <- makeTranscriptDbFromGFF(
file="ref_Macaca_fascicularis_5.0_top_level.gff3.gz",
format="gff3",
exonRankAttributeName=NA,
gffGeneIdAttributeName=NA,
chrominfo=NA,
dataSource=NA,
species=NA,
circ_seqs=DEFAULT_CIRC_SEQS,
miRBaseBuild=NA,
useGenesAsTranscripts=FALSE)
But is massaging this into a transcriptome what we want here? Ugo
hasn't told us what he wants to do with this data. Also I didn't look
closely at the data itself. It may be that you can specify a value for
exonRankAttributeName (which is always what you should want to do if you
can manage it).
Marc
On 07/16/2014 09:10 AM, Michael Lawrence wrote:
> Is there anything makeTranscriptDbFromGFF could do to help with this?
> Sounds like you typically want something like a TxDb, except perhaps with
> some special considerations. Following the NCBI conventions is probably
> worth it.
>
>
> On Wed, Jul 16, 2014 at 8:58 AM, Chris Stubben <stubben at lanl.gov> wrote:
>
>> I would also suggest using rtracklayer import or create a genome data
>> package. At least for microbial genomes, you often just need to return
>> features (CDS, pseudogenes, tRNAs, etc) that have a parent with a locus_tag
>> key and assign that locus tag to the child (the read.gff default), so
>> that's what is getting messed up with your large file.
>> I'll probably use the rtracklayer import object in future versions instead
>> and then join on Parent where locus_tag is NA to the ID where locus_tag is
>> not NA.
>> Chris
>>
>>
>>
>> I cc'd the packageMaintainer(), so that they are more likely to see this
>>> post.
>>>
>> I don't know whether this helps in this particular case, but packages
>>> should be using rtracklayer::import rather than creating their own readers.
>>> Then at least whatever deficiencies are identified and corrected benefit
>>> the entire project.
>>>
>>
>>
>>
>> --
>>
>> Chris Stubben
>>
>> Los Alamos National Lab
>> Bioscience Division
>> MS M888
>> Los Alamos, NM 87545
>>
>> Phone: (505) 667-3295
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.
>> science.biology.informatics.conductor
>>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list