[BioC] Reading gff files in R

Kasper Daniel Hansen khansen at stat.berkeley.edu
Tue Oct 21 19:13:56 CEST 2008


There used to be a package for this, by Oleg Skylar, but it was  
deprecated. There is a nice function in the davidTiling experimental  
data package for extracting attributes from GFF files (probably taken  
from the gff code base?), I'll do a copy and paste here, since  
davidTiling is a large download.

getAttributeField <- function (x, field, attrsep = ";") {
     s = strsplit(x, split = attrsep, fixed = TRUE)
     sapply(s, function(atts) {
         a = strsplit(atts, split = "=", fixed = TRUE)
         m = match(field, sapply(a, "[", 1))
         if (!is.na(m)) {
             rv = a[[m]][2]
         }
         else {
             rv = as.character(NA)
         }
         return(rv)
     })
}

and here is quick parser

gffRead <- function(gffFile, nrows = -1) {
     cat("Reading ", gffFile, ": ", sep="")
     gff = read.table(gffFile, sep="\t", as.is=TRUE, quote="",
     header=FALSE, comment.char="#", nrows = nrows,
     colClasses=c("character", "character", "character", "integer",  
"integer",
     "character", "character", "character", "character"))
     colnames(gff) = c("seqname", "source", "feature", "start", "end",
             "score", "strand", "frame", "attributes")
     cat("found", nrow(gff), "rows with classes:",
         paste(sapply(gff, class), collapse=", "), "\n")
     stopifnot(!any(is.na(gff$start)), !any(is.na(gff$end)))
     return(gff)
}

Now you can do stuff like

gff <- gffRead(gfffile)
gff$Name <- getAttributeField(gff$attributes, "Name")
gff$ID <- getAttributeField(gff$attributes, "ID")

gfffile is just an object holding the file name.

Kasper



On Oct 21, 2008, at 4:56 , Tobias Straub wrote:

> hi Naira
> just parse the gff with
>
> read.delim("gff_file.gff", header=F, comment.char="#") -> gff
>
> and you get a table that you can filter for gene entries, sth like
> gff.genes <- gff[gff[,2]=="gene",]
> depending on  where and how the gene is specified as a gene
>
> T.
>
> On Oct 21, 2008, at 11:00 AM, Naira Naouar wrote:
>
>> Dear all,
>>
>> I was wondering if there were already tools to read gff files in R?
>> I am looking for a fast way to extract gene coordinates from a gff  
>> file.
>>
>> Regards,
>> Naira
>>
>> -- 
>> ==================================================================
>> Naïra Naouar
>> Tel:+32 (0)9 331 38 63
>> VIB Department of Plant Systems Biology, Ghent University
>> Technologiepark 927, 9052 Gent, BELGIUM
>> nanao at psb.ugent.be                         http://www.psb.ugent.be
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ----------------------------------------------------------------------
> Tobias Straub   ++4989218075439   Adolf-Butenandt-Institute, München D
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list