[BioC] How to analyze my hg 1.1 st array

Thu Nov 7 17:14:23 CET 2013

Hi Jerry,

On Wednesday, November 06, 2013 8:22:12 PM, Jerry Cholo wrote:
> Hi everyone,
>
>
>
> I am new in using Oligo-Bioconductor.  I do have some basic questions as
> how to analyze my hg 1.1 st array.  I used following command lines:
>
>
>
> source("http://bioconductor.org/biocLite.R")
>
>      biocLite("pd.hugene.1.1.st.v1")
>
> source("http://bioconductor.org/biocLite.R")
>
>      biocLite("oligo")
>
>
>
> library(oligo)
>
> celFiles <- list.celfiles()
>
> Data <- read.celfiles(celFiles)
>
> ppData <- rma(Data)
>
> boxplot(ppData)
>
> expData <- exprs(ppData);
>
> boxplot(expData)
>
> write.csv(expData, file = "MyData.csv");
>
>
> When I looked at the boxplots annotating ppData, and expData, I noticed
> that ppData was nicely normalized and showed a completely normal
> distribution whereas expData had huge outliers.

The only difference between the boxplots using ppData and expData is in 
the first instance you were only using 10,000 rows of your expression 
data, whereas in the second instance you used all the data.

>
>
> I)       Which one is the output data?  ppData, or expData?

I don't know what you mean by 'output data'. The ppData object is an 
ExpressionSet that contains your summarized expression values, along 
with other data describing the experiment, whereas expData is simply 
the matrix of expression values you got from the ExpressionSet.

>
> II)    Should I apply limma on expData or ppData?

The limma package can use either. This is covered in detail in both the 
limma User's Guide, as well as in the help page for lmFit(). I would 
recommend using the ExpressionSet, as it is designed specifically to 
contain these sorts of data, whereas a matrix is, well, just a matrix.

>
> III)  How could I prepare the data for limma?  May I use a .csv file to
> satrt limma analysis?

Again, covered in the limma User's Guide. All Bioconductor packages 
come with vignettes, which are intended to show general workflows, as 
well as help pages for every function you might need to use. I would 
recommend perusing both.

While you could hypothetically use a .csv file to start the limma 
analysis, I can't see why you would want to. There is no profit in 
reading a bunch of data into R, processing it, then writing it to disk 
only to read it back in again for the next step.

The underlying principle behind Bioconductor is to give people a 
coherent framework of data structures that are intended to both hold 
these sorts of data, and to seamlessly allow one to process those data 
without having to do a bunch of extra steps.

Best,

Jim

>
>
>
> Thanks,
> Jerry
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099