[BioC] gene classification problem
Charles C. Berry
cberry at tajo.ucsd.edu
Thu Dec 9 20:14:39 CET 2004
Oops! minor correction below
On Thu, 9 Dec 2004, Charles C. Berry wrote:
> Borevitz, J.O., Liang, D., Plouffe, D., Chang, H., Zhu, T., Weigel, D.,
> Berry, C.C., Winzeler, E., and Chory. J. (2003) Large Scale Identification of
> Single Feature Polymorphisms in Complex Genomes Genome Research 13,513-523.
> we used individual probesets on Affy arrays to search for polymorphisms among
> inbred strains (hyb'ing genomic DNA rather than RNA).
> A collection of the tools we used to identify probesets and/or regions that
I meant individual probes, not probesets.
> differentially bind according to strain may be found at:
> and the 'Methods' link will connect you to some newer work and scripts.
> Although you seem to have somewhat different objectives, it looks like
> similar statistical tools would apply to your situation.
> On Thu, 9 Dec 2004, Kimpel, Mark W wrote:
>> My apologies to those with far more statistical expertise than I, but I
>> have what may (or may not) be a straightforward question.
>> After performing SAM analysis of an experiment comparing two strains of
>> rats, I have a list of about 200 significant affy rat probesets (genes)
>> that I have mapped to their chromosomal locations. Some of the genes
>> appear to cluster into discrete physical chromosomal regions, which I
>> suspect is related to underlying genetic differences between the two
>> inbred strains. Based on their chromosomal location, I have clustered
>> these significant genes into discrete bins. Something thing to remember
>> when solving this problem is that the distribution along chromosomes of
>> all affy rat probesets is not uniform. Thus my fear that some of the
>> granularity of the chromosomal locations of significant genes could not
>> only be due to chance, but to granularity of the underlying distribution.
>> At this point I would like to test:
>> 1. if the distribution of sig. genes amongst the bins is
>> statistically different from that of the population of all affy
>> genes from which they were drawn.
>> 2. if the above distribution of sig genes is, as I suspect
>> different, which of the bins are responsible for this significant
>> difference. It would be great to assign significance p values to
>> the significance of each bin.
>> I believe this is similar to the problem faced in analyzing the
>> distribution of genes in GO categories but I am not familiar with the
>> proper solution.
>> Any sample code would be greatly appreciated. For an example, assume that
>> I have two matrices, each of two columns with genes represented by rows.
>> The first column is the probeset ID, the second column the "bin" that it
>> falls into. One matrix is of all rat affy genes, the second on is only
>> the significant genes.
>> Mark W. Kimpel MD
>> Department of Psychiatry
>> Indiana University School of Medicine
>> Biotechnology, Research, & Training Center
>> 1345 W. 16th Street
>> Indianapolis, IN 46202
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717
More information about the Bioconductor