[BioC] Data filtering

Anand K S Rao anandksrao at gmail.com
Tue Oct 16 00:50:55 CEST 2012


On Wed, Oct 10, 2012 at 3:06 AM, Mark Robinson <mark.robinson at imls.uzh.ch>wrote:

> Hi Anand,
>
> I've added a few "reactions" below; I hope it can help.
>
>
> > Greetings friends!
> >
> > I seek help with data that I have : 3 time points, 3 genotypes, 3
> replicates for each of these = 27 libraries
> >
> > The goal is to find genes that have different time expression profiles
> amongst 2 or more genotypes.
>
> > After our 1st round of data analysis, (including TMM normalization), the
> time course graphs and box plots were so noisy in terms of high std error
> at each time point, that it was hard to say if expression profile of one
> genotype was overlapping or distinct from that for the other genotypes! R
> code attached at bottom of this post.
>
> What did you actually plot?  What did an MDS plot look like?
>
>

Hello Mark,

Per your advice, we ended up making the MDS plots. The MDS plot is attached
for one genotype only, 9 time points and 4 replicates per time point.
We have not generated MDS plot for across genotypes data as well - that is
our next step.

But even now, it looks like there is quite a bit of variability of
libraries across replicates.

It looks that for each time point we need to remove one or more libraries
that are outliers. In order to do that I suppose there are a few different
ways to do accomplish this :
*
*
*1.* Remove just one library that is an outlier, like T0.2?
*
*
*2.* Remove entire time points because of the scatter of the reps, like
T0.1, T0.2, T0.3 and T0.4 each of which are quite distant from each other
on this MDS plot?

*3.* Remove an entire replicate and retain others, in our data I think
replicate 2 is different from the other three reps, but I dont think this
MDS plot shows that, does it? A simple heirarchical clustering of the 9
time points * 4 reps = 36 libraries is attached. Here you can see similar
behavior as seen in the MDS plot, though the visualizations are different.

How do you reckon we should remove the 'noisy' data, if we should do it at
all?

Thanks again.

- Anand
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MDS_plots_A17_9timepoints_4repseach.pdf
Type: application/pdf
Size: 4835 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20121015/93dcdbff/attachment.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: heatmap_gini_a17.pdf
Type: application/pdf
Size: 15307 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20121015/93dcdbff/attachment-0001.pdf>


More information about the Bioconductor mailing list