[BioC] gene classification problem

Thu Dec 9 20:14:39 CET 2004

Oops! minor correction below

On Thu, 9 Dec 2004, Charles C. Berry wrote:

>
>
> Mark,
>
> In
>
> Borevitz, J.O., Liang, D., Plouffe, D., Chang, H., Zhu, T., Weigel, D., 
> Berry, C.C., Winzeler, E., and Chory. J. (2003) Large Scale Identification of 
> Single Feature Polymorphisms in Complex Genomes Genome Research 13,513-523.
>
> we used individual probesets on Affy arrays to search for polymorphisms among 
> inbred strains (hyb'ing genomic DNA rather than RNA).
>
> A collection of the tools we used to identify probesets and/or regions that 
................................................^^^^^^^^^...

I meant individual probes, not probesets.

> differentially bind according to strain may be found at:
>
> 	http://naturalvariation.org/sfp
>
> and the 'Methods' link will connect you to some newer work and scripts.
>
> ----------
>
> Although you seem to have somewhat different objectives, it looks like 
> similar statistical tools would apply to your situation.
>
> Chuck
>
>
> On Thu, 9 Dec 2004, Kimpel, Mark W wrote:
>
>> My apologies to those with far more statistical expertise than I, but I 
>> have what may (or may not) be a straightforward question.
>> 
>> After performing SAM analysis of an experiment comparing two strains of 
>> rats, I have a list of about 200 significant affy rat probesets (genes) 
>> that I have mapped to their chromosomal locations. Some of the genes 
>> appear to cluster into discrete physical chromosomal regions, which I 
>> suspect is related to underlying genetic differences between the two 
>> inbred strains. Based on their chromosomal location, I have clustered 
>> these significant genes into discrete bins. Something thing to remember 
>> when solving this problem is that the distribution along chromosomes of 
>> all affy rat probesets is not uniform. Thus my fear that some of the 
>> granularity of the chromosomal locations of significant genes could not 
>> only be due to chance, but to granularity of the underlying distribution.
>> 
>> At this point I would like to test:
>> 
>> 1. if the distribution of sig. genes amongst the bins is 
>> statistically different from that of the population of all affy 
>> genes from which they were drawn.
>> 2. if the above distribution of sig genes is, as I suspect 
>> different, which of the bins are responsible for this significant 
>> difference. It would be great to assign significance p values to 
>> the significance of each bin.
>> 
>> I believe this is similar to the problem faced in analyzing the 
>> distribution of genes in GO categories but I am not familiar with the 
>> proper solution.
>> 
>> Any sample code would be greatly appreciated. For an example, assume that 
>> I have two matrices, each of two columns with genes represented by rows. 
>> The first column is the probeset ID, the second column the "bin" that it 
>> falls into. One matrix is of all rat affy genes, the second on is only 
>> the significant genes.
>> 
>> Thanks,
>> 
>> Mark W. Kimpel MD
>> 
>> Department of Psychiatry
>> Indiana University School of Medicine
>> Biotechnology, Research, & Training Center
>> 1345 W. 16th Street
>> Indianapolis, IN  46202
>>  
>> 
>> 
>
> Charles C. Berry                        (858) 534-2098
>                                         Dept of Family/Preventive Medicine
> E mailto:cberry at tajo.ucsd.edu	         UC San Diego
> http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0717
>
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0717