[BioC] GRanges - coercion from dataframe

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Mon Nov 28 16:29:17 CET 2011


For completion, here is my data.frame2GRanges function.  I only ever
translate from dataframes to GRanges and I almost never bother to keep
anything but the location (no metadata).  I also have an option for
taking a stranded data.frame and turning it into an unstranded
GRanges.

I discussed conversion functions like this on the list with Martin
Morgan a long time ago and he thought it would be better to leave them
out from GRanges.

Having said that, the data.frame2GRanges function below (with its
default options), is flat out one of the most used functions in my R
scripts.  It is amazing how much I use this simple function.

keepColumns indicate whether additional data frame columns should be
put into the GRanges.  ignoreStrand makes the strand of the
constructed GRanges equal to *.  It assumes the input data.frame has
columns chr/seqnames, start, end.

data.frame2GRanges <- function(df, keepColumns = FALSE, ignoreStrand = FALSE) {
    stopifnot(class(df) == "data.frame")
    stopifnot(all(c("start", "end") %in% names(df)))
    stopifnot(any(c("chr", "seqnames") %in% names(df)))
    if("seqnames" %in% names(df))
        names(df)[names(df) == "seqnames"] <- "chr"
    if(!ignoreStrand && "strand" %in% names(df)) {
        if(is.numeric(df$strand)) {
            strand <- ifelse(df$strand == 1, "+", "*")
            strand[df$strand == -1] <- "-"
            df$strand <- strand
        }
        gr <- GRanges(seqnames = df$chr,
                      ranges = IRanges(start = df$start, end = df$end),
                      strand = df$strand)
    } else {
        gr <- GRanges(seqnames = df$chr,
                      ranges = IRanges(start = df$start, end = df$end))
    }
    if(keepColumns) {
        dt <- as(df[, setdiff(names(df), c("chr", "start", "end", "strand"))],
                     "DataFrame")
        elementMetadata(gr) <- dt
    }
    names(gr) <- rownames(df)
    gr
}

Kasper

On Mon, Nov 28, 2011 at 1:10 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Sat, Nov 26, 2011 at 6:56 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> It's difficult in general to have a coercion method from a less structured
>> data type (like a data frame) to one more structured/constrained (like a
>> GRanges). There's just no conventions as to how data is stored in the data
>> frame. The RangedData coercion is not biology-aware, so it is not going to
>> interact well with something output from GenomicRanges. I would recommend
>> the approach you took. It can be made a little cleaner via with().
>
> While all of what Michael says is true, sometimes you just want to
> shoot from GRanges <--> data.frame and back again rather easily.
>
> I've defined my own setAs methods for this purpose which you can use here:
>
> https://github.com/lianos/seqtools/blob/master/R/pkg/R/conversions.R
>
> There is really no error checking going on between the conversions,
> but if you have columns of "the right" names in a data.frame, it'll
> convert your data.frame to a GRanges object with no fuss.
>
> Note that if your data.frame has a start, end, and width column it
> will ignore the width and use start/end. All other "non-GRanges"
> columns will be converted into a DataFrame object and stuffed into the
> values() slot of the GRanges object returned.
>
> The usual "use at your own risk" disclaimer applies ...
>
> Enjoy,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list