[R] Subseting a data.frame

Bert Gunter gunter.berton at gene.com
Thu Oct 17 23:19:57 CEST 2013


Thanks, Bill.

But ?ave specifically says:

ave(x, ..., FUN = mean)

Arguments:
x

A numeric.

So that it should not be expected to work properly if the argument is
not (coercible to) numeric. Nevertheless, defensive programming is
always wise.

Cheers,
Bert


On Thu, Oct 17, 2013 at 1:34 PM, William Dunlap <wdunlap at tibco.com> wrote:
>   May I ask why:
>     count_by_class <- with(dat, ave(numeric(length(basel_asset_class)),
> basel_asset_class, FUN=length))
>
>   should not be more simply done as:
>     count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class,
> FUN=length))
>
> The way I did it would work if basel_asset_class were non-numeric.
>
> In ave(x, group, FUN=FUN), FUN's return value should be the same type as x
> (or
>
> you can get some odd type conversions).  E.g.,
>
>
>
>    > num <- c(2,3,2,2) ;  char <- c("Two","Three","Two","Two")
>
>    > ave(num, num, FUN=length) # good
>
>    [1] 3 1 3 3
>
>    > ave(char, char, FUN=length) # bad
>
>    [1] "3" "1" "3" "3"
>
>    > fac <- factor(char, levels=c("One","Two","Three"))
>
>    > ave(fac, fac, FUN=length)
>
>    [1] <NA> <NA> <NA> <NA>
>
>    Levels: One Two Three
>
>    Warning messages:
>
>    1: In `[<-.factor`(`*tmp*`, i, value = 0L) :
>
>      invalid factor level, NA generated
>
>    2: In `[<-.factor`(`*tmp*`, i, value = 3L) :
>
>      invalid factor level, NA generated
>
>    3: In `[<-.factor`(`*tmp*`, i, value = 1L) :
>
>      invalid factor level, NA generated
>
> but x=integer(length(group)) works in all cases:
>
>    > ave(integer(length(fac)), fac, FUN=length)
>
>    [1] 3 1 3 3
>
>    > ave(integer(length(char)), char, FUN=length)
>
>       [1] 3 1 3 3
>
>
>
> Bill Dunlap
>
> Spotfire, TIBCO Software
>
> wdunlap tibco.com
>
>
>
> From: Bert Gunter [mailto:gunter.berton at gene.com]
> Sent: Thursday, October 17, 2013 1:06 PM
> To: William Dunlap
> Cc: Katherine Gobin; r-help at r-project.org
> Subject: Re: [R] Subseting a data.frame
>
>
>
> May I ask why:
>
> count_by_class <- with(dat, ave(numeric(length(basel_
>
> asset_class)), basel_asset_class, FUN=length))
>
> should not be more simply done as:
>
> count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class,
> FUN=length))
>
> ?
>
> -- Bert
>
>
>
> On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap <wdunlap at tibco.com> wrote:
>
>> What I need is to select only those records for which there are more than
>> two default
>> frequencies (defa_frequency),
>
> Here is one way.  There are many others:
>    > dat <- data.frame( # slightly less trivial example
>         basel_asset_class=c(4,8,8,8,74,3,74),
>         defa_frequency=(1:7)/8)
>    > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)),
> basel_asset_class, FUN=length))
>    > cbind(dat, count_by_class) # see what we just computed
>      basel_asset_class defa_frequency count_by_class
>    1                 4          0.125              1
>    2                 8          0.250              3
>    3                 8          0.375              3
>    4                 8          0.500              3
>    5                74          0.625              2
>    6                 3          0.750              1
>    7                74          0.875              2
>    > mydat[count_by_class>1, ] # I think this is what you are asking for
>      basel_asset_class defa_frequency
>    2                 8          0.250
>    3                 8          0.375
>    4                 8          0.500
>    5                74          0.625
>    7                74          0.875
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>> On Behalf
>> Of Katherine Gobin
>> Sent: Thursday, October 17, 2013 11:05 AM
>> To: Bert Gunter
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Subseting a data.frame
>>
>> Correction. (2nd para first three lines)
>>
>> Pl read following line
>>
>> What I need is to select only those records for which there are more than
>> two default
>> frequencies (defa_frequency), Thus, there is only one default frequency =
>> 0.150 w.r.t
>> basel_asset_class = 4 whereas there are default frequencies w.r.t. basel
>> aseet class 4,
>>
>>
>> as
>>
>> What I need is to select only those records for which there are more than
>> two default
>> frequencies (defa_frequency), Thus, there is only one default frequency =
>> 0.150 w.r.t
>> basel_asset_class = 4 whereas there are THREE default frequencies w.r.t.
>> basel aseet
>> class 8,
>>
>>
>>
>> I alpologize for the incovenience.
>>
>> Regards
>>
>> KAtherine
>>
>>
>>
>>
>>
>>
>>
>>
>> On , Katherine Gobin <katherine_gobin at yahoo.com> wrote:
>>
>>  I am sorry perhaps  was not able to put the question properly. I am not
>> looking for the
>> subset of the data.frame where the basel_asset_class is > 2. I do agree
>> that would have
>> been a basic requirement. Let me try to put the question again.
>>
>> I have a data frame as
>>
>> mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency =
>> c(0.15, 0.07, 0.03,
>> 0.001))
>>
>> # Please note I have changed the basel_asset_class to 4 from 2, to avoid
>> confusion.
>>
>> > mydat
>>   basel_asset_class defa_frequency
>> 1                 4          0.150
>> 2                 8          0.070
>> 3                 8          0.030
>> 4                 8          0.001
>>
>>
>>
>> This is just an representative example. In reality, I may have no of basel
>> asset classes. 4, 8
>> etc are the IDs can be anything thus I cant hard code it as subset(mydat,
>> mydat$basel_asset_class > 2).
>>
>>
>> What I need is to select only those records for which there are more than
>> two default
>> frequencies (defa_frequency), Thus, there is only one default frequency =
>> 0.150 w.r.t
>> basel_asset_class = 4 whereas there are default frequencies w.r.t. basel
>> aseet class 4,
>> similarly there could be another basel asset class having say 5 default
>> frequncies. Thus, I
>> need to take subset of the data.frame s.t. the no of corresponding
>> defa_frequencies is
>> greater than 2.
>>
>> The idea is we try to fit exponential curve Y = A exp( BX ) for each of
>> the basel asset
>> classes and to estimate values of A and B, mathematically one needs to
>> have at least two
>> values of X.
>>
>> I hope I may be able to express my requirement. Its not that I need the
>> subset of mydat
>> s.t. basel asset class is > 2 (now 4 in revised example), but sbuset s.t.
>> no of default
>> frequencies is greater than or equal to 2. This 2 is not same as basel
>> asset class 2.
>>
>> Kindly guide
>>
>> With warm regards
>>
>> Katherine Gobin
>>
>>
>>
>>
>> On Thursday, 17 October 2013 9:33 PM, Bert Gunter <gunter.berton at gene.com>
>> wrote:
>>
>> "Kindly guide" ...
>>
>> This is a very basic question, so the kindest guide I can give is to read
>> an Introduction to R
>> (ships with R) or a R web tutorial of your choice so that you can learn
>> how R works
>> instead of posting to this list.
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>> On Wed, Oct 16, 2013 at 11:55 PM, Katherine Gobin
>> <katherine_gobin at yahoo.com>
>> wrote:
>>
>> Dear Forum,
>> >
>> >I have a data frame as
>> >
>> >mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency =
>> > c(0.15, 0.07,
>> 0.03, 0.001))
>> >
>> >> mydat
>> >  basel_asset_class defa_frequency
>> >1                 2          0.150
>> >2                 8          0.070
>> >3                 8          0.030
>> >4                 8          0.001
>> >
>> >
>> >I need to get the subset of this data.frame where no of records for the
>> > given
>> basel_asset_class is > 2, i.e. I need to obtain subset of above data.frame
>> as (since there
>> is only 1 record, against basel_asset_class = 2, I want to filter it)
>> >
>> >> mydat_a
>> >  basel_asset_class defa_frequency
>> >1                 8          0.070
>> >2                 8          0.030
>> >3                 8          0.001
>> >
>> >Kindly guide
>> >
>> >Katherine
>> >        [[alternative HTML version deleted]]
>> >
>> >
>> >______________________________________________
>> >R-help at r-project.org mailing list
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> (650) 467-7374
>>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> (650) 467-7374
>
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374



More information about the R-help mailing list