[BioC] Reading GFF files into R and GenomeGraphs

Michael Dondrup Michael.Dondrup at bccs.uib.no
Mon Aug 10 17:41:15 CEST 2009


Hi,
it took a while to figure out how to read genome annotation files with  
and without rtracklayer, thanks Michael, and also how to plot  
bacterial chromosomes using the GenomeGraphs package. I think, this  
information can be useful for others, so I made a tiny howto. I think  
some of this could also be added to the documentation examples of the  
GenomeGraphs package. I would be glad if someone would tell me if that  
fits or if there are some comments. Thank you again for this package.

1) Howto plot genomic annotation from a GFF file using rtracklayer:

require("GenomeGraphs")
require("rtracklayer")
# read in the gff, example from import.gff:
# import a GFF V2 file
gff <- import.gff(system.file("tests", "v2.gff", package =  
"rtracklayer"), version = "2")
# the gff object contains an IRanges list with the intervals
# we need a function to convert to an AnnotationTrack  from an IRanges  
object:
makeAnnotationTrackFromIRanges = function (iranges,  
dp=DisplayPars(ranges = "yellow", plotId=TRUE )) {	
	iranges.names = if (is.null(names(iranges))) {1:length(iranges)}
						else {names(iranges)} # make some IDs
	annotation = data.frame(start=iranges at start,
						     end=iranges at start+iranges@width-1,
			                             feature=paste("ranges"),  # there is  
no more feature information here
			                             group=c(1:length(iranges)), # put  
every region in a different group
						     ID=iranges.names)
	makeAnnotationTrack(regions=annotation, dp=dp)
}

aTrack = makeAnnotationTrackFromIRanges(gff at ranges[[1]]) # there is a  
RangesList inside the gff

gdPlot(aTrack, minBase=1, maxBase=11000)

2) Howto plot arbitrary chromosome data contained in a data.frame:

I wasn't fully content with the output from the import.gff function, I  
have additional information like gene name & reading frame which I
didn't get this way. Assume there is a data.frame "cds" containing  
annotation with columns containing at least columns:
"GeneID", "Start", "Stop"
Then, the following function can make an AnnotationTrack from it:

makeSingleAnnotationTrack = function (cds, dp=DisplayPars(orf="blue",  
plotId=T)) {
	annotation = data.frame(start=cds$Start, end=cds$Stop,
                                                      
feature=rep.int("orf", nrow(cds)), # this can also be taken from some  
annotation field
			                            group=c(1:nrow(cds)), ID=cds$GeneID )
	makeAnnotationTrack(annotation, dp=dp)
}

3) Add coverage data to the annotation data:
# convert an iranges object into a coverage track
# I think this is a bit inefficient for large iranges objects, is  
there a better way?
makeBaseTrackCoverageFromIRanges = function(iranges, ...) {
	baseCoverage = as.numeric(coverage(iranges))
	makeBaseTrack(base=1:length(baseCoverage), value=baseCoverage, ...)
}
# and then plot this
  x = IRanges(start=c(2, 0, NA), end=c(NA, NA, 14), width=11:0)
plotlist = list(makeAnnotationTrackFromIRanges(x),  
makeBaseTrackCoverageFromIRanges(x))
gdPlot(plotlist,min=1, max=20)

This is maybe not very nice, but hope this helps.....

best
Michael


Am 30.07.2009 um 13:24 schrieb Michael Lawrence:

> On Thu, Jul 30, 2009 at 12:52 AM, Michael Dondrup <
> Michael.Dondrup at bccs.uib.no> wrote:
>
>> Dear BioC list,
>>
>> I'm trying to use the package GenomeGraphs to visualise custom  
>> genome data
>> (genome not in public databases). In the corresponding publication to
>> genomegraphs (Durinck et al. BMC Bioinformatics, 2009 ) I found the  
>> partial
>> clue: "... genomic annotation encoded in GFF files can be easily  
>> used to
>> create a custom AnnotationTrack object for visualization... region  
>> start,
>> and end positions need to be given, as well as how these regions  
>> are to be
>> grouped."
>> This leaves me with two questions:
>> - I have no idea how to parse a GFF file, is there a GFF parser in
>> Bioconductor?
>>
>
> Yes, rtracklayer has a parser. Please see the vignette.
>
>
>>
>> - If I have such GFF file for a genome, how can I create such an
>> AnnotationTrack with all CDS?
>>
>> Hope somebody can help me with one of this.
>>
>> Best
>> Michael
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list