[R] Problems with Boxplot

gug guygreen at netvigator.com
Thu Sep 3 13:41:11 CEST 2009


I'm posting answers to my own Q's here - as far as I have answers - first so
that people don't spend time on them, and second in case the solutions are
helpful to anyone else in future.

1) My first question is: is there a simple way of getting both dates along
the x-axis and the "*100" calculation (or percentages)?
I still don't know how to change the format of the y-axis tick labels.  I'd
be interested if anyone has a quick way to get percentages and additionally,
how do I get numbers in the "0,000" format along the x or y-axis?  In the
meantime, I can live with this.

2) Next is how can I put a legend somewhere to show that red is "data set 1"
and blue is "data set 2".
I did this with the following text:
legend("top", c("Top","Bottom"), cex=1.5, lty=1:2, fill=c("lightblue",
"salmon"), bty="n")

3) Is it possible to get the date to straddle across each of the two dates
it covers: as it is, one tick has the date, the other does not.
I didn't manage to do this, but as there were over 20 dates in the final
data (i.e. 40 plots), by changing the width of the chart window, not every
plot was labeled anyway and it was clear enough.

4) Is it possible to show both the median and the mean with boxplot?
I gave up on this, but I think the data looks OK in the end with just the
boxplot defaults.

5) Finally, the code works as described above (i.e. up to a point) with the
"Post trial data.csv" file I have posted.  However when I try with a larger
file ("Larger trial.csv", also posted), I get the message: "Error in
scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :  line
145 did not have 50 elements" when I get to the "data_headings" line.  I
have no idea why R is seeing a difference between these two files.
I ended up finding that even for specific small files, I got this error
message, which prevented me from processing the data and so was fatal to the
code.  I narrowed it down to a small file, and then looked at the csv file
in notepad.  The bottom of the file (which was just 2 columns of data, of
different column lengths), was along these lines:

-0.48013245,0.095652174
-0.039344262,-0.067142857
0.018022077,-0.079295154
-0.078534031,
0.010054845,
0.096153846,
0.177568018
0.013818182
0.002402883
 
It seemed that R could cope with empty columns - as long as there was a ","
to indicate that there was indeed a column, but it could NOT cope with a
column that didn't exist (because there was no ",").  The problem was that
Excel, which was generating the CSV file, wasn't putting "," to indicate
empty columns in certain circumstances.  The solution was to fill the empty
cells in Excel with "na" before saving as CSV.  Excel then saves it
correctly, and R deals with it correctly.  

The final code (though without the y-axis formatting being fixed) is:

testdata<- c("C:\\Files\\R\\Sample R code\\Post trial data.csv")
new_data<- read.table(testdata, skip = 0, sep = ",", na.strings =
"na",header = TRUE)
x11(width=16, height=7, pointsize=14)
boxplot(new_data,outline = FALSE, col = c("lightblue", "salmon"), las =1,
boxwex = 0.5) 
legend("top", c("Label for blue boxes","Label for red boxes"), cex=1.5,
lty=1:2, fill=c("lightblue", "salmon"), bty="n");
title(main="Chart title text", cex.main = 1.8)
grid()  

Guy


gug wrote:
> 
> Hello,
> 
> I have been having difficulty getting boxplot to give the output I want -
> probably a result of the way I have been handling the data.
> 
> The data is arranged in columns: each date has two sets of data.  The
> number of data points varies with the date, so each column is of different
> length.  I want to get a series of boxplots with the date along the
> x-axis, with alternating colors, so that it is easy to see the difference
> between the results within each date, as well as across dates.
> 
> testdata<- c("C:\\Files\\R\\Sample R code\\Post trial data.csv")
> data_headings <- read.table(testdata, skip = 0, sep = ",", header =
> FALSE)[1,]
> my_data <- read.table(testdata, skip = 1, sep = ",", na.strings =
> "na",header = FALSE)
> boxplot(my_data*100, names = data_headings, outline = FALSE, range = 0.3,
> border = c(2,4))
> 
> The result is a boxplot, but it does not show the date along the bottom
> (the "names = data_headings" bit achieves nothing).  I can alternatively
> try this:
> 
> new_data<- read.table(testdata, skip = 0, sep = ",", na.strings =
> "na",header = TRUE)
> boxplot(new_data,outline = FALSE, range = 0.3,border = c(2,4))
> 
> This takes all the data and plots it, but I then lose the ability to
> multiply by 100 (I'm trying to show percentages: e.g. 10% as "10", rather
> than as "0.1").
> 
> 1) My first question is: is there a simple way of getting both dates along
> the x-axis and the "*100" calculation (or percentages)?
> 
> 2) Next is how can I put a legend somewhere to show that red is "data set
> 1" and blue is "data set 2".
> 
> 3) Is it possible to get the date to straddle across each of the two dates
> it covers: as it is, one tick has the date, the other does not.
> 
> 4) Is it possible to show both the median and the mean with boxplot?
> 
> 5) Finally, the code works as described above (i.e. up to a point) with
> the "Post trial data.csv" file I have posted.  However when I try with a
> larger file ("Larger trial.csv", also posted), I get the message: "Error
> in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
> line 145 did not have 50 elements" when I get to the "data_headings" line. 
> I have no idea why R is seeing a difference between these two files.
>  http://www.nabble.com/file/p25256461/Post%2Btrial%2Bdata.csv
> Post+trial+data.csv 
> http://www.nabble.com/file/p25256461/Larger%2Btrial.csv Larger+trial.csv 
> Thanks for any suggestions,
> 
> Guy Green
>  
> 
> 

-- 
View this message in context: http://www.nabble.com/Problems-with-Boxplot-tp25256461p25274286.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list