[R] PCA analysis

Daniel Malter daniel at umd.edu
Thu Jun 19 22:33:15 CEST 2008


Hi Mona, I cannot get it done with the princomp and the biplot commands
either (maybe somebody can), but there are always many ways to Rome. This is
how you can do it (below). However, the label=rep... below assumes that your
values are in order, i.e. that you really want to plot the first fifty rows
with one symbol, the second with another, and so forth. If your values are
not ordered, you will either have to order your dataset or create a variable
that indicates the condition by which you want to group your data and choose
the symbols. Assigning this variable for your desired grouping would then
most likely involve a loop or a nested ifelse() statement, unless you
already have this variable. You then assign your grouping variable to the
"pch" argument (for different symbols), the "col" argument (for different
colors), or both. 

##create data
z<-sample(401:600)
y<-sample(701:900)
x<-sample(1:200)
data.frame(x,y)->df
cbind(df, z)->df

##pc analysis
pc=prcomp(df)

##inspect results
pc
summary(pc)
pc$rotation

##compute pc values for each observation
pc.data=t(t(pc$rotation)%*%t(df))
##check
pc.data

##create point labels
label=rep(1:4, each=50)

##plot first PC
##versus second PC
##with label indicated
##by the variable label
plot(pc.data[,1],pc.data[,2],pch=label,col=label
,xlab="First principal component",ylab="Second principal component")

---------------------------------------------------
Thank you for your reply. 

pch=NA got me rid of the numbers or names of samples that I´m plotting. The
problem with how I can replace these with different symbols still remain. I
know I can use points to give additional symbols, but I can´t get the rigth
values plotted from the outcome of princomp(data). The class of the object
is princomp, and I can´t specify which columns should be plotted for the
points.

ex (my real dataframe consists multiple(hundreds) colums of data for ca 200
samples):

 z<-sample(401:600)
> y<-sample(701:900)
> x<-sample(1:200)
> data.frame(x,y)->df
> cbind(df, z)->df
> princomp(df)->p
> biplot(p, pch=NA)
> row.names(df)<-1:200

Now I would like for instance all the samples that have row.names under 50
to be plotted in one symbol, the iones from 50-100 in another and so on. Do
I need a special function for specifying these different symbols, when my
samples are not in a correct order?

As you realize I am quite new with R. Thank you so much for taking your time
helping me, I really appreciate it.


Regards, Monna 

> From: daniel at umd.edu
> To: monnire at hotmail.com; r-help at r-project.org
> Subject: AW: [R] PCA analysis
> Date: Tue, 17 Jun 2008 19:40:41 -0400
> 
> I am not entirely sure after reading your email, but I thought you wanted
to
> do something like this:
> 
> ###Start of example
> 
> ###create random data for the example
> x=rnorm(100,100,10) ##create Xs
> e=rnorm(100,0,5) ##create Errors
> y=x+e ##create Ys
> 
> ###plot
> plot(y~x,pch=NA) ##plot Ys against Xs but suppress all symbols (i.e.
> plot invisibly)
> text(y~x,labels=round(x),pch=NULL) ##use values of X (rounded to its
integer
> value) as symbols for the X-Y plot 
> 
> ###End of example
> 
> So you could just substitute your variable names for x and y in the plot()
> and text() commands. Let us know whether your problem is solved.
> 
> Cheers,
> Daniel
> 
> -------------------------
> cuncta stricte discussurus
> -------------------------
> 
> -----Ursprüngliche Nachricht-----
> Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
> Auftrag von Monna Nygård
> Gesendet: Tuesday, June 17, 2008 5:04 AM
> An: r-help at r-project.org
> Betreff: [R] PCA analysis
> 
> 
> Hi,
> 
> I have a problem with making PCA plots that are readable. 
> I would like to set different sympols instead of the numbers of my samples
> or their names, that I get plotted (xlabs). 
> How is this possible? With points, i don4t seem to get the right data
> plotted onto the PCA plot, as I do not quite understand from where it is
> taken. I dont know how to plot the correct columns of the prcomp outcome
> (p). 
> I would really appreciate if someone could help me, I have struggled with
> this for days now. How can I make a function that gives different symbols
> for the points, depending on how big the number given to it as xlabs is?
> 
> Making the plots.
> 
> read.table(file = "S:\\SEDIM\\TRFLP\\B90-700.txt",sep="\t",
> header=T)->boutbout <-bout[-1]p <- prcomp(bout) biplot(p, choices =
c(2,3),
> scale = 1, pc.biplot = FALSE, var.axes = F, ylabs = NULL,
>
xlabs=c("119","175","135","330","51","422","67","409","470","70","67","89","
>
135","215","330","409","470","51","80","119","175","222","301","422","280","
>
171","256","243","404","37","157","28","187","70","42","283","261","85","147
>
","204","235","411","514","77","204","87","366","306","351","371","38","534"
>
,"199","407","42","167","480","195","22","35","80","433","43","109","214","3
>
63","292","61","115","178","273","521","72","126","253","288","501","83","11
>
3","250","359","498","19","130","389","324","24","58","124","388","319","164
>
","101","153","383","345","219","179","161","375","298","450","555","439","5
>
4","54","490","465","411","18","85","503","455","394","179","187","416","447
>
","219","461","164","366","474","167","236","507","319","509","467","507","4
> 50","359","507","192","453","101","456","512","517"), cex=0.67,
> main="90-700bp")
> 
> _________________________________________________________________
> [[elided Hotmail spam]]
> 
> PLink
> [[alternative HTML version deleted]]
> 
> 



----------------------------------------------------------------------------
----
Senaste kändisnyheterna & hetaste skvallret! MSN Kändisnytt 



More information about the R-help mailing list