[BioC] Data analysis
smyth at wehi.edu.au
Sat Dec 6 01:47:01 MET 2003
At 09:14 AM 5/12/2003, Naomi Altman wrote:
>I also have a data set with differing numbers of spot replications. I
>used lme to analyze these data, gene by gene.
>Basically, I wrote a little function that pulls the spot information out
>of the array, removes the flagged spots and does other data cleaning, and
>then runs lme (using "try" in case it bombs). Then I use
>"split" to split the array data by geneID, and lapply to apply the
>function to every gene.
You can do this fine, but it is not equivalent to the limma treatment of
duplicate spots. Limma does a two-stage analysis, the first stage of which
is equivalent to lme().
>Is this slow? Yes. But once it is tested I just get it started on Friday
>at 5, and by Monday at 9 I have my results.
>The major drawback is that I am doing a gene by gene ANOVA. The major
>advantage is that I can safely remove flagged spots, instead of trying to
>fudge in some values to maintain the balance.
If you start with the same number of spot replications, limma allows you to
remove flagged spots without any fudging by setting the corresponding spot
weights to zero.
>At 11:40 PM 10/16/2003, Gordon Smyth wrote:
>>At 11:53 PM 16/10/2003, Jason Skelton wrote:
>>>Gordon Smyth wrote:
>>>>I would use the limma commands lmFit (or lm.series or gls.series)
>>>>followed by makeContrasts, eBayes and classifyTests. See the earliers posts:
>>>Thanks for this infomation Gordon I'll try this and see what results I
>>>On a different note
>>>The arrays I have tested LIMMA on have 2 duplicates and are spaced
>>>evenly throughout the array and so have no problems running your functions.
>>>Someone else at the Sanger Insitite would like to be able to use LIMMA
>>>but the number of duplicates for each gene differs on their array e.g
>>>for some genes their are two copies and for others there would be four
>>>copies or more which inturn obviously effects spacing etc between replicates.
>>>I'm not sure why they would want differing numbers of copies of genes
>>>but they would like to be able to estimate the correlation between these
>>>genes anyway and obviously see the results as one data point per merged gene.
>>I haven't implemented this in limma because it seems to me that it might
>>invalidate the assumptions behind the duplicate correlation approach. See
>>the earlier post:
>>>I've tried to think of how this can be done but it seems overly complex
>>>and I'm not sure if it is at all possible in R or Limma.
>>>I'm guessing there is no way of carryout the correlation, series model
>>>fits etc based simply on the "Name" specified in the GAL files ?
>>>or some how specifying the duplicate number for each gene seperately
>>>and somehow merging this information for use as a parameter ?
>>>I'm doubting very much that this can be done at all but it's worth
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>Naomi S. Altman 814-865-3791 (voice)
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348 (Statistics)
>University Park, PA 16802-2111
More information about the Bioconductor