[R] reshaping data

Fri May 21 18:10:57 CEST 2010

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Mia Bengtsson
> Sent: Friday, May 21, 2010 3:39 AM
> To: Dennis Murphy; Henrique Dallazuanna
> Cc: r-help at r-project.org
> Subject: Re: [R] reshaping data
> 
> Thank you Dennis and Henrique for your help!
> 
> Both solutions work! I just need to find a way of removing 
> the empty "cells" from the final "long" dataframe since they 
> are not NAs. 
> 
> Maybe there is an easier way of doing this of the data is not 
> treated as a dataframe? The original data file that is 
> derived from another program (mothur) is a textfile with the 
> following format:
> 
> red \t A,B,C
> green \t D
> blue \t E,F
> 
> The first column "species" is separated from the 
> "sequences"(A, B, C...) with tab, and then the "sequences" 
> are separated from each other with commas.
> 
> I imported into R as what I thought was a dataframe using:
> 
> test1<-readLines("path/test")
> test2<-gsub(pattern= "\t", otu, replacement=",")
> test3<-textConnection(test2)
> test.df<-read.csv(test3, header=F)
> 
> Should I rather have imported it as something else if I want 
> to reshape it into a list as described previously?

Does the following do what you want, where my "txt" should
resemble the output of your test1, the output of
readLines("path/test")?

> txt <- c("red \t A,B,C", "green \t D", "blue \t E,F")
> f <- function (textLines) {
    tmp <- strsplit(textLines, " *\t *")
    letters <- strsplit(vapply(tmp, FUN = `[`, 2, FUN.VALUE = ""), 
        ",")
    numLetters <- vapply(letters, FUN = length, FUN.VALUE = 0L)
    data.frame(Species = rep(vapply(tmp, FUN = `[`, 1, FUN.VALUE = ""), 
        numLetters), Letter = unlist(letters))
}
> f(txt)
  Species Letter
1     red      A
2     red      B
3     red      C
4   green      D
5    blue      E
6    blue      F

vapply() is new in R 2.11.? and is like sapply but lets
you specify what the return value of FUN is expected to
be.  Thus it gives you some error checking, saves some
time over sapply, and works nicely when the length of the
input is 0.  If you don't have 2.11 replace with by sapply
and remove the FUN.VALUE argument.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> 
> Thanks a million!
> 
> / Mia Bengtsson
> 
> 
> On May 21, 2010, at 2:15 AM, Dennis Murphy wrote:
> 
> > Hi:
> > 
> > 
> > On Thu, May 20, 2010 at 10:13 AM, Mia Bengtsson 
> <mia.bengtsson at bio.uib.no> wrote:
> > Hello,
> > 
> > I am a relatively new R-user who has a lot to learn. I have 
> a large dataset that is in the following dataframe format:
> > 
> > red             A       B       C
> > green   D
> > blue    E       F
> > 
> > This isn't a data frame in R - if it were, it would have NA 
> (or at least ""/" "padding at the end of each row.
> > Data frames are not ragged arrays. To have this type of 
> structure in R, the data would have to be in a list.
> > 
> > This matters because Henrique's solution with reshape() 
> assumes a data frame as input. A similar solution
> > would be to use melt() in the reshape package, something like
> > 
> > library(reshape)
> > longdf <- melt(yourdf, id.var = 'species')
> > longdf
> > 
> > If you have NA padding, the way to get rid of them in the 
> reshaped data frame is (with the above approach)
> > 
> > longdf[!is.na(longdf$value), -longdf$variable]
> > 
> > If the padding is with blanks, then Henrique's solution 
> works here, too.
> > 
> > HTH,
> > Dennis
> > 
> > 
> > Where red, green and blue are "species" names and A, B and 
> C are observations (corresponding to DNA sequences). Each 
> observation can only belong to one species. I would like to 
> list the observations in one column, with the species they 
> belong to in the next. Like this:
> > 
> > A       red
> > B       red
> > C       red
> > D       green
> > E       blue
> > F       blue
> > 
> > I have tried using reshape() and stack() but I cannot get 
> my head around it. Any help is highly appreciated!
> > 
> > Thanks in advance,
> > __________________________________
> > 
> > Mia Bengtsson, PhD-student
> > Department of Biology
> > University of Bergen
> > +47 55584715
> > +47 97413634
> > mia.bengtsson at bio.uib.no
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>