[R] Combining dataframes with different row numbers and plotting with ggplot2

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Jan 9 20:05:49 CET 2016


Please study each line of code, and use the str command to study the 
intermediate data objects... the examples on this list are almost never 
plug-and-play for your real work. Note that while you provided some of the 
code necessary to make your example reproducible, I had to fill in blanks 
with additional code... the Posting Guide asks you to make your example 
run as-is to get us to the point where you are having problems. The below 
code is a model for posing your future questions as well as an answer to 
this one.

library(ggplot2)

DF1 <- read.table( text =
"case size
case1 120
case2 120
case3 121
case4 121
case5 121
case6 122
case7 122
case8 123
", header=TRUE, as.is=TRUE )

# note the fewer records below
DF2 <- read.table( text =
"case size
case1 120
case2 120
case3 121
case4 121
case5 121
case6 122
case7 122
", header=TRUE, as.is=TRUE )

# While you CAN use reshape to make long data out of wide data, that 
# method for making long data will always presume you have the same number 
# of records for each case. Combine your data directly into long form if 
# that is how it is best represented.
# Below note the use of labels such as "Source" to organize the data
# Also note the use of "stringsAsFactors = FALSE" because concatenating
# factors is almost never a good idea... go read (again?) about what
# factors are if you don't understand why concatenating factors doesn't
# work well
DFL <- rbind( data.frame( Source = "DF1"
                         , size = DF1$size
                         , stringsAsFactors = FALSE
                         )
             , data.frame( Source = "DF2"
                         , size = DF2$size
                         , stringsAsFactors = FALSE
                         )
             )

# Your intent in making this graph is still a little opaque to me.. the 
# breaks are causing logarithmic axis labels, but not all of the breaks 
# show up
ggplot( data = DFL
       , aes( x=size, fill=Source ) ) +
     geom_histogram( binwidth = 500 ) + # might want "position='dodge`"?
     scale_x_continuous( breaks = c( seq( 300, 800, by = 200 )
                                   , seq( 1000, 15000, by = 1000 )
                                   )
                       )

On Sat, 9 Jan 2016, maryam moazam wrote:

> Dear Michael,
>
> Thanks for your feedback. Actually, I would like to show (and compare) size
> distribution of df1 and df2 in the single plot using ggplot2, something
> like the attached picture. The command dosesn't lead me to this purpose.
> However, I'm really new here, could you please help me more on this?
>
>
> Thanks in advance,
> Maryam
>
>
>
>
>
> On Sat, Jan 9, 2016 at 5:38 PM, Michael Dewey <lists at dewey.myzen.co.uk>
> wrote:
>
>> Dear Maryam
>>
>> If you just need all the values of size would
>> c(df1$size, df2$size)
>> work?
>>
>> On 08/01/2016 21:44, maryam moazam wrote:
>>
>>> Dear Sir / Madam,
>>>
>>> I have just come to the amazing R software, so please be patient if my
>>> question is basic for you. I have 2 text file (say 1.txt and 2.txt), each
>>> file containing 2 columns and different row numbers, like below
>>>
>>> case size
>>> case1 120
>>> case2 120
>>> case3 121
>>> case4 121
>>> case5 121
>>> case6 122
>>> case7 122
>>> case8 123
>>>
>>> I would like to have a one plot for all text files, with x-axis shows the
>>> size between 300-1200 with the interval of 200 (300,500,700,900,1200) and
>>> size between 1201-1500 with the interval of 1000. For dataframes with the
>>> equal row numbers, the following codes worked well,
>>>
>>> df1 = data.frame("1.txt", header=T)
>>> df2 = data.frame("2.txt", header=T)
>>> *combining two dataframes with equal row number*
>>>
>>> df = data.frame(df1$size,df2$size)
>>> library(reshape)
>>> melted <- melt(df)
>>>
>>> ggplot(data=melted, aes(value))+aes(fill=variable)+ geom_histogram
>>> (binwidth =500)+
>>>
>>>
>>> +scale_x_continuous(breaks=c(seq(300,1000,by=200),seq(1001,15000,by=1000)))
>>>
>>>
>>> but I couldn't reproduce the plot with these codes for dataframes with
>>> different row number. I think the problem is* how to combine datafrmaes
>>> with the different row number*, could you please help me out on this
>>> issue?
>>>
>>> Thank you in advance
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> --
>> Michael
>> http://www.dewey.myzen.co.uk/home.html
>>
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list