[BioC] pca plot for gene expression

Michael Love michaelisaiahlove at gmail.com
Tue Jul 22 14:24:40 CEST 2014


hi Deepak,

This is a general R question as it doesn't involve software from
Bioconductor, so you should post future questions like this to the R
help mailing list https://stat.ethz.ch/mailman/listinfo/r-help or as
an R tagged question on stackoverflow
http://stackoverflow.com/questions/tagged/r.

You can obtain the mean for each group many ways. One way is to use
the ddply function in the plyr package on CRAN:
http://cran.r-project.org/web/packages/plyr/plyr.pdf

d = data.frame(PC1 = pc$x[,1], PC2 = pc$x[,2], f = factor(condition))
library(plyr)
groupmeans = ddply(d, "f", summarise, mPC1=mean(PC1), mPC2=mean(PC2))

This gives the mean of PC1 and the mean of PC2 for each group.

Mike

On Tue, Jul 22, 2014 at 5:10 AM, deepak karthik <deepaksrna at gmail.com> wrote:
> Hi Michael
>
>                Even if i perform coloring the carcinoma it  is so crowded ,
> i am not able to distinguish between cancer . That is the reason that i
> wanted to find a way to point out  a single point for each cancer . My
> ultimate is to find the cancer which are related , with respective my gene
> of interest . Please suggest me a better approach.
>
> thank you,
> Deepak
>
>
>
> On Wed, Jul 16, 2014 at 3:50 PM, Michael Love <michaelisaiahlove at gmail.com>
> wrote:
>>
>> hi Deepak,
>>
>> We like to always keep the discussion on the list, to avoid having to
>> answer duplicate questions.
>>
>> Collapsing all the patients into a single point defeats the purpose of
>> PCA: to see the distances between individual samples and groups of
>> samples. Showing just the mean for each group might mislead someone
>> looking at the plot into thinking the clusters are distinct, when the
>> samples might have high variance around that average point. I would
>> recommend instead just coloring the types of carcinoma.
>>
>> Mike
>>
>> On Wed, Jul 16, 2014 at 5:43 AM, deepak karthik <deepaksrna at gmail.com>
>> wrote:
>> > Thanks for your reply.
>> > I have data of hundreds of patients from each carcinoma , consisting of
>> > rnaseq expression with certain gene of interest. If i perform pca
>> > analysis
>> > for numerous carcinoma  , my pca plot would be clumsy difficult to find
>> > out
>> > the type of carcinoma are clustered together . so i would like to mark
>> > single point for a particular type of carcinoma with consideration of
>> > my
>> > rnaseq expression  for my  gene of my interest . Thanks in advance .
>> >
>> > with regards,
>> > S.Deepak
>> >
>> >
>> > On Tue, Jul 15, 2014 at 9:31 PM, Michael Love
>> > <michaelisaiahlove at gmail.com>
>> > wrote:
>> >>
>> >> You might get more feedback if you describe what kind of experiment
>> >> you have performed (microarray or RNA-Seq?).
>> >>
>> >> The other reason you might not be getting response is that the
>> >> principal component functions are not implemented in Bioconductor, but
>> >> in base R. So it's not necessarily a Bioconductor question, but a
>> >> statistics/R question.
>> >>
>> >> The very basic code for making a PCA plot from an expression set 'e'
>> >> would
>> >> be
>> >>
>> >> pc = prcomp( t ( exprs( e ) ) )
>> >> plot( pc$x[ , 1:2 ] )
>> >>
>> >> On Tue, Jul 15, 2014 at 8:36 AM, karthik [guest]
>> >> <guest at bioconductor.org>
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> >         I am writing this mail for second time. I wanted perform a
>> >> > pca
>> >> > analysis ,for each  cancer type and genes of interest expression. I
>> >> > just
>> >> > wanted to plot only a single  point which is able represent each
>> >> > cancer and
>> >> > their genes expression .Can you please explain me on it.( And cancer
>> >> > per
>> >> > gene basis should i take median or mean values to represent their
>> >> > expression). Thanks in advance.
>> >> >
>> >> >
>> >> >  -- output of sessionInfo():
>> >> >
>> >> > pca()
>> >> >
>> >> > --
>> >> > Sent via the guest posting facility at bioconductor.org.
>> >> >
>> >> > _______________________________________________
>> >> > Bioconductor mailing list
>> >> > Bioconductor at r-project.org
>> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> > Search the archives:
>> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Deepak karthik
>> >
>> > PhD student
>> >
>> > +972-054-5683140
>> >
>> >
>> > Dr. Mali Salmon-Divon, Genomic Bioinformatics laboratory
>> >
>> > The Department of Molecular Biology
>> >
>> > Ariel University, Israel
>
>
>
>
> --
>
> Deepak karthik
>
> PhD student
>
> +972-054-5683140
>
>
> Dr. Mali Salmon-Divon, Genomic Bioinformatics laboratory
>
> The Department of Molecular Biology
>
> Ariel University, Israel



More information about the Bioconductor mailing list