[R] Question regarding sqldf

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jul 31 19:05:40 CEST 2009


Your sql statement is using a variable in the having clause
that is not in the group by clause.  Given that group by is supposed
to produce a single output per group we see that this construct is
problematic since what should happen in the case that there
are two equal maximum values in the same group?  Its possible
that some databases can support that anyways but I don't think
sqlite does.  Write your query like this:

>  sqldf("select * from testframe a where a.bext =
+   (select max(bext) from testframe where a.sdate = sdate)")
  sdate     bext otherstuff     stuff
1  day1 94.10626   47.20670 15.340856
2  day2 89.34949   55.36058 19.572773
3  day3 95.72650   52.16694 17.632534
4  day4 95.49586   47.28667 12.676297
5  day5 99.43271   48.98959  3.013745


On Thu, Jul 30, 2009 at 5:30 PM, <Rizzo.Michael at epamail.epa.gov> wrote:
> Here is a test data frame:
>
>> testframe<-data.frame(sdate=rep(paste
> ("day",1:5,sep=""),each=5),bext=runif(25,1,100),otherstuff=runif
> (25,45,60),stuff=runif(25,3,25))
>> testframe
>   sdate      bext             otherstuff     stuff
> 1   day1 37.863859   49.19816 10.036211
> 2   day1 58.557049   59.23145 21.793954
> 3   day1 70.345462   48.82313 24.630742
> 4   day1  1.245913   57.70718 14.131270
> 5   day1 40.170634   50.38963 21.420328
> 6   day2 36.126846   51.97043 22.815169
> 7   day2 95.801896   53.30061 13.540535
> 8   day2 19.959486   50.46775  9.403084
> 9   day2  5.772996   55.14082 17.137113
> 10  day2 77.956966   51.44969 13.448527
> 11  day3  6.743810   54.49836 17.650795
> 12  day3 60.758896   55.35407  5.005359
> 13  day3 86.924873   57.74122  8.691572
> 14  day3 91.933544   48.17538  8.559737
> 15  day3 23.627126   59.28906  4.766172
> 16  day4 76.200158   55.78072 19.493428
> 17  day4 67.809049   51.06784  7.116428
> 18  day4 70.309643   54.38067 10.736461
> 19  day4 91.152166   58.39768 23.199943
> 20  day4 30.358417   54.16645 24.153823
> 21  day5 64.064953   58.31813 19.856882
> 22  day5 49.220106   48.49721 16.615940
> 23  day5 21.554696   47.03421  3.585458
> 24  day5 30.227874   55.51724 14.435731
> 25  day5 51.518716   54.31445 20.837326
>
> If I run the following sqldf statement:
>
>  maxdays<-sqldf("select distinct * from testframe group by sdate having
> bext=max(bext)")
>
> I get:
>
>> maxdays
> data frame with 0 columns and 0 rows
>
> I want to get the entire observation for each day where "bext" is the
> maximum for the day.
>
>
>
> |------------>
> | From:      |
> |------------>
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
>  |milton ruser <milton.ruser at gmail.com>                                                                                                       |
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | To:        |
> |------------>
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
>  |Michael Rizzo/RTP/USEPA/US at EPA                                                                                                              |
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Cc:        |
> |------------>
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
>  |r-help at r-project.org                                                                                                                        |
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Date:      |
> |------------>
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
>  |07/30/2009 03:03 PM                                                                                                                         |
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
> |------------>
> | Subject:   |
> |------------>
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
>  |Re: [R] Question regarding sqldf                                                                                                            |
>  >--------------------------------------------------------------------------------------------------------------------------------------------|
>
>
>
>
>
> Hi Rizzi,
>
> how about a reproducible example/data.frame?
>
> :-)
> milton
>
> On Thu, Jul 30, 2009 at 1:46 PM, <Rizzo.Michael at epamail.epa.gov> wrote:
>
>  Hello,
>
>  I am having a problem using sqldf.  I'm trying to choose a subset of
>  observations from a data set based on the date and maximum value of a
>  variable by date.
>
>  Here is the code I am using:
>
>  test<-sqldf("select distinct * from bextuse group by sdate having
>  bext=max(bext)",method="raw");
>
>  The result I get back is a data frame with 0 rows and 0 columns.  I
>  have
>  tried the code in another program that utilizes SQL, and I retrieve
>  the
>  14 rows I was expecting to get.
>  I looked at the SQLite information on the web, and it mentions that
>  the
>  "having" clause can be used to select observations from grouped data
>  using aggregating functions.
>
>  Any help or advice is greatly appreciated.
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list