[R] please help ! label selected data points in huge number of data points potentially as high as 50, 000 !

Sarah Goslee sarah.goslee at gmail.com
Sun Mar 6 14:11:23 CET 2011


I think you've made your problem too complicated.

Given your example below (and THANK YOU for including a workable
example), is this not what you need?

sigdata <- dataf[dataf$p < 0.01,]
plot(dataf$xvar, dataf$p)
text(sigdata$xvar, sigdata$p, sigdata$name)

text() will take vectors of arguments.

Sarah

On Sat, Mar 5, 2011 at 6:29 PM, Umesh Rosyara <rosyaraur at gmail.com> wrote:
> Dear All
>
> I am reposting because I my problem is real issue and I have been working on
> this. I know this might be simple to those who know it ! Anyway I need help
> !
>
> Let me clear my point. I have huge number of datapoints plotted using either
> base plot function or xyplot in lattice (I have preference to use lattice).
>         name xvar            p
> 1       M1    1  0.107983837
> 2       M2   11  0.209125624
> 3       M3   21  0.163959428
> 4       M4   31  0.132469859
> 5       M5   41  0.086095130
> 6       M6   51  0.180822010
> 7       M7   61  0.246619925
> 8       M8   71  0.147363687
> 9       M9   81  0.162663127
> ........
> 5000 observations
>
> I need to plot xvar (x variable) and p (y variable) using either plot () or
> xyplot(). And I want show (print to graph) datapoint name labels to those
> rows that have p value < 0.01 (means that they are significant). With my
> limited R knowlege I can use text (x,y, labels) option to manually add the
> text, but I have huge number of data point(though I provide just 1000 here,
> potentially it can go upto 50,000). So I want to display name corresponding
> to those observations (rows) that have pvalue less than 0.05 (threshold).
>
> Here is my example dataset and my status:
> name <- c(paste ("M", 1:5000, sep = ""))
> xvar <- seq(1, 50000, 10)
> set.seed(134)
> p <- rnorm(5000, 0.15,0.05)
> dataf <- data.frame(name,xvar, p)
>
> # using lattice (my first preference)
> require(lattice)
> xyplot(p ~ xvar, dataf)
>
> #I want to display names for the following observation that meet requirement
> of p <0.01.
> which (dataf$p < 0.01)
> [1]  811  854 1636 1704 2148 2161 2244 3205 3268 4177 4564 4614 4639 4706
>
> Thus significant observations are:
>        name  xvar             p
> 811   M811  8101  0.0050637068
> 854   M854  8531 -0.0433901783
> 1636 M1636 16351 -0.0279014039
> 1704 M1704 17031  0.0029878335
> 2148 M2148 21471  0.0048898232
> 2161 M2161 21601 -0.0354130557
> 2244 M2244 22431  0.0003255200
> 3205 M3205 32041  0.0079758430
> 3268 M3268 32671  0.0012797145
> 4177 M4177 41761  0.0015487439
> 4564 M4564 45631  0.0024867152
> 4614 M4614 46131  0.0078381964
> 4639 M4639 46381 -0.0063151605
> 4706 M4706 47051  0.0032200517
>
> I want the datapoint (8101, 0.0050637068) with M811 in the plot. Similarly
> for all of the above (that are significant). I do not want to label all out
> of 5000 who do have p value < 0.01. I know I can add manually - text (8101,
> 0.0050637068, M811) in plot() in base.
>
> plot (dataf$xvar,p)
> text (8101, 0.0050637068, "M811")
> text (8531, -0.0433901783, "M854")
>
> I need more automation to deal with observations as high as 50,000. In real
> sense I do not know how many variables there will be.
>
> You help is highly appreciated. Thank you;
>
> Best Regards
>
> Umesh R
>
>
>
>
-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list