[BioC] Order in which ReadAffy() and read.affybatch()

Sat Mar 19 02:37:36 CET 2005

See comments below.

On Fri, 2005-03-18 at 11:18 -0800, Hrishikesh Deshmukh wrote:
> Hi All,
> 
> I am not that familiar with BioC terms, i know
> readaffy() and read.affybatch() makes it easy to read
> CEL file and two "different" kinds of objects are

You need help() ! From details section in help("ReadAffy") or help
("read.affybatch") :

     'ReadAffy' is a wrapper for 'read.affybatch' that permits the user
     to read in phenoData, MIAME information, and CEL files using
     widgets. One can also define files where to read phenoData and
     MIAME information.

BTW, there is no such function as readaffy(). Remember that R is case
sensitive.

> created and typically some "kinds" of analyses for
> example boxplot() may work with readaffy() and not
> with read.affybatch() and hist() might work well with
> read.affybatch()!! But these are the kinds of

These are two different ways of reading in the data. They both read CEL
files into AffyBatch class.

> questions for which docs are non-existant! Vignettes

Try looking up the appropriate help pages. Or search via help.search()
and/or the mailing archives.

> help but they only whet your appetite but do not
> satisfy your hunger!! Sorry went in different
> direction.
> 
> I do work with multiple OS and thanks for the piece of
> very important information. One simple way to make
> sure no matter what order the files are read, doing
> simple hist() and/or boxplot() and then making sure
> that right labels(filenames) are given for the

Err, how does looking at histograms tell you which columns belong to
which file, especially considering that many thousands of points make
them look very similar. Often a simple head() and a check into CEL files
would be sufficient.

> lines/plots.....now to do this simple thing one has to
> go through lot of documentation! Ahhh!!!

This is the process of learning and it is not guaranteed to be easy.

> Is there a book on BioC specifically which will help
> people be conversant with terms and use it
> efficiently!!! 

a) this is a rather very dynamic field and 
b) IMHO, most BioC members are busy improving the techniques used for
design and analysis

I am not sure a book on BioConductor would be available. If it is, it
may grow outdated fairly rapidly. Your best bet is to either look at the
help() or look under "Documentation" on the BioConductor website. I have
benefited from the documents from Short Courses, Lab Materials, Research
Talks, ... 

> But hats off to the mailing list members for answering
> my simple/naive questions.
> 
> Regards,
> Hrishi
> 
> 
> --- Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
> wrote:
> > See comments below.
> > 
> > On Fri, 2005-03-18 at 08:26 -0800, Hrishikesh
> > Deshmukh wrote:
> > > Hello All,
> > > 
> > > I have questions about the order in which
> > ReadAffy()
> > > and read.affybatch() reads in affy CEL files. I
> > need
> > 
> > Alphabetically, but the behaviour may vary between
> > Windows and Linux due
> > to case sensitivity.
> > 
> > > this piece of information because i want to label
> > the
> > > arrays when i look at hist() and boxplot(). I want
> > to
> > 
> > This is a dangerous practice as you will be assuming
> > that filenames are
> > read alphabetically. If you work on multiple OS,
> > this might be a
> > nightmare.
> > 
> > Besides, since the filenames are used as the column
> > names in ReadAffy
> > you do not need to need to care about which order it
> > reads in the files.
> > 
> > raw <- ReadAffy()
> > head( exprs( raw ) )
> > 
> >        a.CEL    b.CEL    c.CEL   d.CEL
> > [1,]    253.8    335.8    176.5   238.3
> > [2,]  19607.3  19437.5  11239.5 20985.5
> > [3,]    218.0    275.3    169.5   263.5
> > [4,]  20284.5  19956.8  11324.8 21180.5
> > [5,]     87.5     94.8    100.3    78.5
> > [6,]    224.5    237.8    186.5   165.8
> > 
> > Then you can do a strsplit() the column names or
> > match() it to something
> > else.
> > 
> > 
> > > make sure that right labels (filenames) are
> > displayed
> > > for its corresponding lines/boxplots. 
> > > 
> > > Is there a book specifically on BioC, this would
> > be a
> > > big help?
> > > 
> > > In general on what basis does one accept/reject
> > arrays
> > > from a pool of replicates! The hist() and
> > boxplot()
> > > shows clearly that all the arrays (replicates) do
> > not
> > > show the same "behaviour".
> > 
> > This is before preprocessing right ? There could be
> > systematic noises
> > that preprocessing algorithms can handle. I think
> > people usually reject
> > on the basis of biological evidence such as
> > housekeeping genes, RNA
> > degradation plots or eye-balling the chip. 
> > 
> > 
> > > Here are the code fragments:
> > > library(affy)
> > > library(hgu95av2cdf)
> > > library(hgu95av2probe)
> > > library(matchprobes)
> > > data(hgu95av2probe)
> > > summary(hgu95av2probe)
> > > file.names<-c("1.CEL",  "2.CEL",  "3.CEL", 
> > "4.CEL", 
> > > "5.CEL","6.CEL","7.CEL",  "8.CEL",  "9.CEL", 
> > > "10.CEL", 
> > >
> >
> "11.CEL","12.CEL","13.CEL",14.CEL","15.CEL","16.CEL","17.CEL")
> > > M<-read.affybatch(filenames=file.names,
> > > description=NULL,notes="",compress=F,   
> > > m.mask=F,rm.outliers=F,rm.extra=F,verbose=T)
> > 
> > Why not just do ReadAffy() ? It will return the
> > filenames as column
> > names. 
> > 
> > > hist(M)
> > > legend(12,1.2,sampleNames(M),col=1:17,lty=1:17) 
> > 
> > Interesting. Why do I get a density plot when I call
> > hist() on an
> > Affybatch class ?
> > 
> > > When i run the legend line i see hist() displays
> > > different "lines" and legend does not match
> > correctly!
> > > 
> > > Thanks in advance.
> > > Hrishi
> > > 
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > 
> > 
> > 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
>