[BioC] Clustering question

Tim Triche, Jr. tim.triche at gmail.com
Wed Jul 11 17:06:49 CEST 2012


try the 'lpc' or 'superpc' package if what you want is supervision.
or if you have a crapload of covariates/mutations/whatever, do CCA
(try 'PMA').

consider using a logit transform on the betas if you are doing linear
modeling; note that big changes are more or less invariant to the
transformation.

and switch to using SummarizedExperiments if you want flexibility in
slicing up the data genomically.  I wrote some coercions and generics
for this class and I'll be submitting a package since they've been
incredibly useful to me, for e.g. subsetting by GRanges.  Will present
some examples at Bioc2012.

variance testing is tricky, look at what Haim Bar and Jim Booth have
done with empirical Bayes mixture modeling for p1 vs. p0.


On Wed, Jul 11, 2012 at 2:14 AM, Gustavo Fernández Bayón
<gbayon at gmail.com> wrote:
> Hi everybody.
>
> Imagine the following scenario: I have a Methylation data ExpressionSet with 40 samples and 450K probes (Illumina kind). Samples are divided in two classes, and I would like to characterize families of probes according to their behavior. That is, I would like to find a set of probes hypermethylating with respect to the covariate that divides between classes, another one showing that variability increases between classes, etc.
>
> I have been trying some ideas around the following workflow:
>
> 1) Filtering of the data (non-specific, sexual chromosome genes, ..)
> 2) Transformation into a lower-dimensional, summary, subspace. For example, if I have 20 beta values for a class, and 20 for the other, above transformation takes the 40-dimensional beta values vector and summarizes it as a 2 dimensional vector, with the first component being the difference of the medians of the two classes, and the second one being the difference in their IQR. My idea was to summarize data and work with those transformed variables that really characterize what I am looking for.
> 3) Clustering in the new subspace. For now, I am using k-means as a baseline clustering
> method. My idea was to test a hierarchical method and maybe a Bayesian dp-means, among others.
>
> This is mainly a exploratory workflow. I want to know how these probes behave according to the above variables, and I am testing different ideas on my data. But I was wondering if I am doing right by summarizing the beta values into the new variables, or if there is some alternative (maybe model-based) for doing this kind of exploratory work. Apart from losing a lot of information on the way, am I getting into problems for doing that?
>
> Any hint or suggestion will be appreciated.
>
> Regards,
> Gus
>
>
> ---------------------------
> Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
A model is a lie that helps you see the truth.

Howard Skipper



More information about the Bioconductor mailing list