[R] ggplot2 / reshape / Question on manipulating data

Pete Kazmier pete-expires-20070910 at kazmier.com
Thu Jul 12 16:41:13 CEST 2007

"hadley wickham" <h.wickham at gmail.com> writes:

> On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
>> I'm an R newbie but recently discovered the ggplot2 and reshape
>> packages which seem incredibly useful and much easier to use for a
>> beginner.  Using the data from the IMDB, I'm trying to see how the
>> average movie rating varies by year.  Here is what my data looks like:
>> > ratings <- read.delim("groomed.list", header = TRUE, sep = "|", comment.char = "")
>> > ratings <- subset(ratings, VoteCount > 100)
>> > head(ratings)
>>                              Title  Histogram VoteCount VoteMean Year
>> 1                !Huff (2004) (TV) 0000000016       299      8.4 2004
>> 8              'Allo 'Allo! (1982) 0000000125       829      8.6 1982
>> 50              .hack//SIGN (2002) 0000001113       150      7.0 2002
>> 56            1-800-Missing (2003) 0000000103       118      5.4 2003
>> 66  Greatest Artists (2000) (mini) 00..000016       110      7.8 2000
>> 77 00 Scariest Movie (2004) (mini) 00..000115       256      8.6 2004
> Have you tried using the movies dataset included in ggplot?  Or is
> there some data that you want that is not in that dataset.

It's funny that you mention this because I had intended to write this
email about a month ago but was delayed due to other reasons.  In any
case, when I was typing this up last night, I wanted to recreate my
steps but I could not find the IMDB movie data I had used originally.
I searched everywhere to no avail so I downloaded the data myself and
groomed it.  Only now do I remember that I had used the movies dataset
included in ggplot.

>> How do 'byYear' and 'byYear2' differ?  I am trying to use 'typeof' but
>> both seem to be lists.  However, they are clearly different in some
>> way because 'qplot' graphs them differently.
> Try using str - it's much more helpful, and you should see the
> different quickly.

Thanks!  This is the function I've been looking for in my quest to
learn about internal data types of R.  Too bad it has such a terrible

> Using the built in movies data:
> mm <- melt(movies, id=1:2, m=c("rating", "votes"))
> msum <- cast(mm, year ~ variable, c(mean, sum))
> qplot(year, rating_mean, data=msum, colour=votes_sum)
> qplot(year, rating_mean, data=msum, colour=votes_sum, geom="line")

Great!  This is exactly what I was looking to do.  By the way, does
any of your documentation use the movie dataset as an example?  I'm
curious what else I can do with the dataset.  For example, how can I
use ggplot's facets to see the same information by type of movie?  I'm
unsure of how to manipulate the binary variables into a single
variable so that it can be treated as levels.


More information about the R-help mailing list