[BioC] Statistics for Diagnostic Microarrays

Thu Jul 8 14:46:56 CEST 2004

I'll try not to accidentally press "Send" this time!

Actually, a lot of the work for pattern recognition is already there -
from classical statistics and from use with proteomics data:

http://bioinformatics.med.yale.edu/proteomics/BioSupp1.html

Strangely I am not so worried about this.  What does seem to be missing,
as you have pointed out, is things like normalisation techniques, and
also answering the question "is this a real signal or just background
noise?".  

Some diagnostic uses of microarrays simply ask the question whether an
mRNA is present or not in the sample.  This could be a problem with low
copy mRNAs.  Presumably, to answer this question, one must have a robust
distribution of values of what a spot looks like when the mRNA is
ABSENT, and then you would compare your normalised observed intensity
value against this distribution to decide whether that mRNA is there or
not. 

Again thoughts and comments are appreciated

Mick 

-----Original Message-----
From: michael watson (IAH-C) 
Sent: 08 July 2004 13:42
To: 'Adaikalavan Ramasamy'
Cc: BioConductor mailing list
Subject: RE: [BioC] Statistics for Diagnostic Microarrays

Actually, a lot of the work for pattern recognition is already there -
from classical statistics and from use with proteomics data:

-----Original Message-----
From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk] 
Sent: 08 July 2004 13:37
To: michael watson (IAH-C)
Cc: BioConductor mailing list
Subject: Re: [BioC] Statistics for Diagnostic Microarrays

Dear Mick,

I think there is a gold field of opportunities for statistics in this
field. With more and more companies advertising disease-specific chips,
there are still questions to be answers, namely :

a) gene selection : Only several hundreds or thousands of genes are
going to be selected for their discriminating ability.

b) normalisation  : The assumption that majority (90-95%) of the genes
unchanged will not hold here. If you are going to use "housekeeping"
genes, which ones to use and how to use them. So far, the main
normalisation methods (justifiably) ignore housekeeping genes as they
vary from sample to sample.

c) multiple spots : If you are going to spot, say 2000 genes, then you
can spot 10 of each at random positions on the chip. This not only
affects the normalisation (highly correlated spots) but also the
analysis aspect (is there a better approach than averaging?).

d) classification : How does one assign the probability that a patient
has a disease given the expression profile of thousands of genes. I
think we may require pattern recognition techniques or machine learning
approaches and a large enough learning set.

e) better classification : Is the diagnostic chip better than existing
tests (if any) and is it cost efficient.

Sorry for pointing out more questions than answers but I feel that more
people should be be asking these before buying/designing a designer
boutique arrays.

I think what people are currently doing is using microarrays as
filtering tool along with other knowledge to obtain a marker
gene/protien that they can easily test for. The relevant key word is
metabolonomics.

HTH, Adai.

On Thu, 2004-07-08 at 09:12, michael watson (IAH-C) wrote:
> Hi
> 
> Obviously the greatest use for Microarrays is for gene expression
> studies, but increasingly scientists wish to use Microarrays for a 
> variety of diagnostic studies, which centre more around "Is it there 
> or not?" type questions rather than "How much of it is there?".  Does 
> anyone know of any statistical tools or software that can be used 
> specifically for diagnostic microarrays?
> 
> Thanks
> 
> Mick
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>