[R] ggplot2 / reshape / Question on manipulating data
pete-expires-20070910 at kazmier.com
Thu Jul 12 16:41:13 CEST 2007
"hadley wickham" <h.wickham at gmail.com> writes:
> On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
>> I'm an R newbie but recently discovered the ggplot2 and reshape
>> packages which seem incredibly useful and much easier to use for a
>> beginner. Using the data from the IMDB, I'm trying to see how the
>> average movie rating varies by year. Here is what my data looks like:
>> > ratings <- read.delim("groomed.list", header = TRUE, sep = "|", comment.char = "")
>> > ratings <- subset(ratings, VoteCount > 100)
>> > head(ratings)
>> Title Histogram VoteCount VoteMean Year
>> 1 !Huff (2004) (TV) 0000000016 299 8.4 2004
>> 8 'Allo 'Allo! (1982) 0000000125 829 8.6 1982
>> 50 .hack//SIGN (2002) 0000001113 150 7.0 2002
>> 56 1-800-Missing (2003) 0000000103 118 5.4 2003
>> 66 Greatest Artists (2000) (mini) 00..000016 110 7.8 2000
>> 77 00 Scariest Movie (2004) (mini) 00..000115 256 8.6 2004
> Have you tried using the movies dataset included in ggplot? Or is
> there some data that you want that is not in that dataset.
It's funny that you mention this because I had intended to write this
email about a month ago but was delayed due to other reasons. In any
case, when I was typing this up last night, I wanted to recreate my
steps but I could not find the IMDB movie data I had used originally.
I searched everywhere to no avail so I downloaded the data myself and
groomed it. Only now do I remember that I had used the movies dataset
included in ggplot.
>> How do 'byYear' and 'byYear2' differ? I am trying to use 'typeof' but
>> both seem to be lists. However, they are clearly different in some
>> way because 'qplot' graphs them differently.
> Try using str - it's much more helpful, and you should see the
> different quickly.
Thanks! This is the function I've been looking for in my quest to
learn about internal data types of R. Too bad it has such a terrible
> Using the built in movies data:
> mm <- melt(movies, id=1:2, m=c("rating", "votes"))
> msum <- cast(mm, year ~ variable, c(mean, sum))
> qplot(year, rating_mean, data=msum, colour=votes_sum)
> qplot(year, rating_mean, data=msum, colour=votes_sum, geom="line")
Great! This is exactly what I was looking to do. By the way, does
any of your documentation use the movie dataset as an example? I'm
curious what else I can do with the dataset. For example, how can I
use ggplot's facets to see the same information by type of movie? I'm
unsure of how to manipulate the binary variables into a single
variable so that it can be treated as levels.
More information about the R-help