[BioC] how to combine microarray data and phenotype data into a least squares analysis?

Mikhail, Amy a.mikhail at abdn.ac.uk
Tue Sep 30 17:36:04 CEST 2008


Hi Martin,

There are quite a few packages that deal with least squares analysis - also partial least squares.  In the latter case there is even one package that is expressly for microarray / gene expression data: the package name is plsgenomics - you can download it from CRAN and it will take your data as is (experiments in columns, genes in rows).

Also there is a nice paper by Mevik and Wehrens (2007) with examples for the pls package (also in CRAN) which I found to be very useful for explaining how it works and how to do pls analyses generally.  You can find it here:

http://www.jstatsoft.org/v18/i02/paper

Note that if you use any ls / pls packages not expressly designed for microarray data, you will have to transpose your matrix first, so that the experiments / microarrays are the rows and each gene is a column:

> MyMatrix.t<-t(MyMatrix)

After that, your transposed matrix of gene expression data becomes the response (Y) in the formula, and your cell percentages as well as any other info you have for each experiment become the predictors (Xs).  I would combine the cell percentages vector and any other Xs you want to look at into a data.frame (you might also want some descriptive columns containing the cell cycle stage if this is not explicit in the cell % column), which should have the same number of rows as your transposed gene expression matrix (i.e. a cell % for each microarray).

If you follow the near infra red examples in the above paper it should make sense, because near infra red data is format-wise quite like microarray data (there are maybe 1000 or more spectral readings at different wavelengths for each sample, just as you have thousands of log-fold differences for different genes for each microarray).

Hope this helps,

Best wishes,
Amy
------------------------------------------------------------------------------------------------------------------------------------------
Hello everyone,

I am looking for some help in setting up an analysis protocol in R for my
microarray dataset. My knowledge of R is still somewhat rudimentary, but,
having worked with it for about half a year now I do understand the basics
and can get most of the packages that I've needed to work. However, the past
week I've been stumped on a certain analysis that I would like to perform on
my results.

My dataset consists of microarrays of RNAi experiments that affect the cell
cycle. Part of the results is a phenotypic analysis, where I have the
percentages of cells in the different stages of the cell cycle. Now I would
like to link this phenotypic data with the microarray data and find out
whether the expression of genes is linked with a certain stage of the cell
cycle. So, currently I have the matrix of all my microarray data, where the
columns are the experiments, and the rows are the genes, and the values are
their log-fold differences compared to wild type. I also have vectors that
contain for each experiment the percentage of cells in a specific stage of
the cell cycle (a vector for G1, one for G2, etc).

Now I am quite at a loss on how to link these two together, I was suggested
to use a least squares analysis and I've been trying make lsfit() work for
my data, but so far without luck. The documentation with these functions
generally is rather hard to understand for me and finding descriptive guides
on how to do something like this has been very unsuccessful so far, probably
because I am not really sure how this would be named.

I hope that someone out here understands what I am trying to do and perhaps
can give me a hint or two on what I should be looking into.

Many thanks in advance.

Martin




---------------------------------------------------------------
Amy Mikhail
Research student
Vector and parasite biology group
Institute of Biological and Environmental Sciences
University of Aberdeen
Zoology Building
Tillydrone Avenue
AB24 2TZ
Aberdeen
Scotland

email: a.mikhail at abdn.ac.uk
phone: +44 - (0)1224 - 273256 (office)


The University of Aberdeen is a charity registered in Scotland, No SC013683.



More information about the Bioconductor mailing list