[R] Clustering problem

Abhishek Pratap abhishek.vit at gmail.com
Mon Mar 21 18:48:11 CET 2011


Hi Guys

I want to apply a clustering algo to my dataset in order to find the
regions points(X,Y) which have similar values(percent_GC and
mean_phred_quality). Details below.

I have sampled 1% of points from my main data set of 85 million
points.  The result is still somewhat large 800K points and  looks
like following.


     X     Y    percent_GC  mean_phred_quality
1  4286 930       0.50           0.13
2  4825 947       0.50           20.33
3  8207 932       0.32           26.50
4  8451 940       0.48           24.81
5  9331 931       0.38           16.93
6 11501 949       0.49          31.28

What I want to do is find local regions in which I have associations
between these 4 values i.e points X,Y have close correlation with
percent_GC and mean_phred_quality.

PS:  I did calculate the overall pearson correlation coeff between
percent_GC and mean_phred_quality and it is not statistically
significant which got me interested into finding local regions where
it may be.

I would really appreciate your help as I am still a rookie in applying
clustering algorithms.

Thanks!
-Abhi



More information about the R-help mailing list