[BioC] gene classification problem

Thu Dec 9 20:07:46 CET 2004

Mark,

In

Borevitz, J.O., Liang, D., Plouffe, D., Chang, H., Zhu, T., Weigel, D., 
Berry, C.C., Winzeler, E., and Chory. J. (2003) Large Scale Identification 
of Single Feature Polymorphisms in Complex Genomes Genome Research 
13,513-523.

we used individual probesets on Affy arrays to search for polymorphisms 
among inbred strains (hyb'ing genomic DNA rather than RNA).

A collection of the tools we used to identify probesets and/or regions 
that differentially bind according to strain may be found at:

 	http://naturalvariation.org/sfp

and the 'Methods' link will connect you to some newer work and scripts.

----------

Although you seem to have somewhat different objectives, it looks like 
similar statistical tools would apply to your situation.

Chuck

On Thu, 9 Dec 2004, Kimpel, Mark W wrote:

> My apologies to those with far more statistical expertise than I, but I have what may (or may not) be a straightforward question.
>
> After performing SAM analysis of an experiment comparing two strains of 
> rats, I have a list of about 200 significant affy rat probesets (genes) 
> that I have mapped to their chromosomal locations. Some of the genes 
> appear to cluster into discrete physical chromosomal regions, which I 
> suspect is related to underlying genetic differences between the two 
> inbred strains. Based on their chromosomal location, I have clustered 
> these significant genes into discrete bins. Something thing to remember 
> when solving this problem is that the distribution along chromosomes of 
> all affy rat probesets is not uniform. Thus my fear that some of the 
> granularity of the chromosomal locations of significant genes could not 
> only be due to chance, but to granularity of the underlying 
> distribution.
>
> At this point I would like to test:
>
> 	1. if the distribution of sig. genes amongst the bins is statistically different from that of the population of all affy genes from which they were drawn.
> 	2. if the above distribution of sig genes is, as I suspect different, which of the bins are responsible for this significant difference. It would be great to assign significance p values to the significance of each bin.
>
> I believe this is similar to the problem faced in analyzing the distribution of genes in GO categories but I am not familiar with the proper solution.
>
> Any sample code would be greatly appreciated. For an example, assume that I have two matrices, each of two columns with genes represented by rows. The first column is the probeset ID, the second column the "bin" that it falls into. One matrix is of all rat affy genes, the second on is only the significant genes.
>
> Thanks,
>
> Mark W. Kimpel MD
>
> Department of Psychiatry
> Indiana University School of Medicine
> Biotechnology, Research, & Training Center
> 1345 W. 16th Street
> Indianapolis, IN  46202
>  
>
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0717