[R] Error bars and CI

Mohan.Radhakrishnan at cognizant.com Mohan.Radhakrishnan at cognizant.com
Thu Jun 18 11:43:12 CEST 2015


Hi Dennis,
                         I have copied the 'r' group. Could you explain ? Why can't we compute CI and error bars using this data set ?
The graph generated has equal-sized error bars and a 99% confidence band. Groups are not needed here. But the error bar and CI calculations could be incorrect but I am able to draw this.

      V1 IDX
1  0.796   1
2  0.542   2
3  0.510   3
4  0.617   4
5  0.482   5
6  0.387   6
7  0.272   7
8  0.536   8
9  0.498   9
10 0.402  10
11 0.328  11
12 0.542  12
13 0.299  13
14 0.647  14
15 0.291  15
16 0.815  16
17 0.680  17
18 0.363  18
19 0.560  19
20 0.334  20

Assume the dataframe is 'jc'.

print(summary(jc$V1))
error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1))
error1 <- mean(jc$V1)-error
error2 <- mean(jc$V1)+error
print(error1)
print(error2)

q <- qplot(geom = "line",jc$IDX,jc$V1, colour='red')+geom_errorbar(aes(x=jc$IDX, ymin=jc$V1-sd(jc$V1), ymax=jc$V1+sd(jc$V1)), width=0.25)+
                geom_ribbon(aes(x=jc$IDX, y=jc$V1, ymin=error1, ymax=error2),fill="ivory2",alpha = 0.4)+
                xlab('Iterations') + ylab("Java Collections")+theme_bw()


Thanks,
Mohan

-----Original Message-----
From: Dennis Murphy [mailto:djmuser at gmail.com]
Sent: Wednesday, June 17, 2015 8:42 PM
To: Radhakrishnan, Mohan (Cognizant)
Subject: Re: [R] Error bars and CI

Q: How do you expect to get error bars when you plot "groups" having samples of size 1? If you "are not grouping", then what is the point of trying to manufacture variation where none exists? I'd suggest you think a little more deeply about what you can achieve with the available data.

This plot visualizes the data you posted. Every point is accounted for. I named the input data frame DF.

ggplot(DF, aes(x = IDX, y = V1)) +
   geom_line() + geom_point()

If you don't have replicate data at each unique x-value you want to plot, you cannot legitimately plot error bars, confidence intervals or any other visual that describes a (summary of) a distribution. If the values of V1 are supposed to represent averages that come from other data set, then you should have a corresponding column of standard deviations/standard errors, and *then* you can plot error bars, CIs, etc. Without a legitimate measure of variation in your input data frame, I don't see how you can possibly generate a line graph with accompanying error bars/CIs.

Dennis

On Wed, Jun 17, 2015 at 1:13 AM,  <Mohan.Radhakrishnan at cognizant.com> wrote:
> I think it could be something like this. But the mean is for the entire set. Not groups.
> I get a graph with this code but error bars are not there.
>
>
> p<-ggplot(jc,aes(IDX,V1,colour=V1))
> p <- p + stat_summary(fun.y=mean,geom="point")
> p <- p + stat_summary(fun.y=mean,geom="line")
> p <- p + stat_summary(fun.data=mean_cl_normal,conf.int = .99,
> geom="errorbar", width=0.2)
>
>
> Thanks,
> Mohan
>
> -----Original Message-----
> From: Radhakrishnan, Mohan (Cognizant)
> Sent: Wednesday, June 17, 2015 12:54 PM
> To: 'Dennis Murphy'
> Cc: r-help at r-project.org
> Subject: RE: [R] Error bars and CI
>
> Your sample code is working. But I am missing the logic when my dataset is involved.
>
> My full dataset is this. It is the V1 column I am interested in.  I am not 'grouping' here.
>
>       V1 IDX
> 1  0.796   1
> 2  0.542   2
> 3  0.510   3
> 4  0.617   4
> 5  0.482   5
> 6  0.387   6
> 7  0.272   7
> 8  0.536   8
> 9  0.498   9
> 10 0.402  10
> 11 0.328  11
> 12 0.542  12
> 13 0.299  13
> 14 0.647  14
> 15 0.291  15
> 16 0.815  16
> 17 0.680  17
> 18 0.363  18
> 19 0.560  19
> 20 0.334  20
>
> Thanks,
> Mohan
>
> -----Original Message-----
> From: Dennis Murphy [mailto:djmuser at gmail.com]
> Sent: Tuesday, June 16, 2015 1:18 AM
> To: Radhakrishnan, Mohan (Cognizant)
> Subject: Re: [R] Error bars and CI
>
> Hi:
>
> Firstly, your dplyr code to generate the summary data frame is unnecessary and distracting, particularly since you didn't provide the input data set; you are asked to provide a *minimal* reproducible example, which you could easily have done with a built-in data set.
> That said, to get what I perceive you want, I used the InsectSprays data from the autoloaded datasets package.
>
> # Function to compute standard error of a mean sem <- function(x)
> sqrt(var(x)/length(x))
>
> ## Use insectSprays data for illustration ## Compute mean and SE of
> count for each level of spray
>
> library(dplyr)
> library(ggplot2)
>
> insectSumm <- InsectSprays %>%
>                   group_by(spray) %>%
>                   summarise(mean = mean(count), se = sem(count))
>
>
> # Since the x-variable is a factor, need to map group = 1 to # draw lines between factor levels. geom_pointrange() can be # used to produce the 99% CIs per factor level, geom_errorbar() # for the mean +/- SE. I ordered the geoms so that the errorbar # is last, but if you want it (mostly) overwritten, put the # geom_pointrange() call last.
>
> ggplot(insectSumm, aes(x = spray, y = mean)) +
>    theme_bw() +
>    geom_line(aes(group = 1), size = 1, color = "darkorange") +
>    geom_pointrange(aes(ymin = mean - qt(.995, 11) * se,
>                       ymax = mean + qt(.995, 11) * se),
>                    size = 1.5, color = "firebrick") +
>    geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2,
>                    size = 1)
>
> Clearly, you can pipe all the way through the ggplot() call, but I wanted to check the contents of the summary data frame first.
>
> Dennis
>
> On Mon, Jun 15, 2015 at 3:51 AM,  <Mohan.Radhakrishnan at cognizant.com> wrote:
>> Hi,
>>
>> I want to plot a line graph using this data. IDX is x-axis and V1 is y-axis.  I also want standard error bars and 99% CI to be shown. My code is given below. The section that plots the graph is the problem.  I don't see all the points in the line graph with error bars. How can I also show the 99% CI in the graph ?
>>
>>       V1 IDX
>> 1  0.987  21
>> 2  0.585  22
>> 3  0.770  23
>> 4  0.711  24
>>
>> library(stringr)
>> library(dplyr)
>> library(ggplot2)
>>
>> data <- read.table("D:\\jmh\\jmh.txt",sep="\t")
>>
>> final <-data %>%
>>            select(V1) %>%
>>               filter(grepl("^Iteration", V1)) %>%
>>         mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))
>>
>> final <- mutate(final,IDX = 1:n())
>>
>> jc <- final %>%
>>               filter(IDX < 21)
>>
>>
>> #Convert to numeric
>> jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x))))
>>
>> print(jc)
>>
>> # The following section is the problem.
>>
>> sem <- function(x){
>>        sd(x)/sqrt(length(x))
>> }
>>
>> meanvalue <- apply(jc,2,mean)
>> semvalue <- apply(jc, 2, sem)
>>
>> mean_sem <- data.frame(mean= meanvalue, sem= semvalue,
>> group=names(jc))
>>
>> #larger font
>> theme_set(theme_gray(base_size = 20))
>>
>> #plot using ggplot
>> p <- ggplot(mean_sem, aes(x=group, y=mean)) +
>>               geom_line(stat='identity') +
>>               geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),
>>                            width=.2)
>> print(p)
>>
>> Thanks,
>> Mohan
>> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.


More information about the R-help mailing list