[R] help sub setting data frame

Sean MacEachern sean.maceach at gmail.com
Fri Oct 23 00:29:58 CEST 2009


Works perfectly!

Thanks to all who responded.

Sean

On Thu, Oct 22, 2009 at 6:24 PM, Ista Zahn <istazahn at gmail.com> wrote:
> Is this what you want?
>
> df = data.frame('id'=c(1:100),'res'=c(1001:1100))
> dfb=df[1:10,]
> dfc = df[df$id %in% dfb$id,]
>
> Still not sure, but that's my best guess. Going back to your original
> data you can try
>
>  dfb = chkPd[chkPd$PN %in% df$PN,]
>
> Hope it helps,
> Ista
>
> On Thu, Oct 22, 2009 at 6:10 PM, Sean MacEachern <sean.maceach at gmail.com> wrote:
>> Hi Ista,
>>
>> I think I'm suffering long dayitis myself. You are probably right. I
>> don't use subset that often. I typically use brackets to subset
>> dataframes. Essentially what I am trying to do is take my original
>> dataframe (chkPd) and subset it using a smaller dataframe with some
>> matching PN IDs. They are only a few hundred rows different in size so
>> subset wouldn't be appropriate here. I'm just struggling to figure out
>> what's going wrong in my first example.
>> for instance if I try:
>>> df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20))
>>> dfb=df[1:2]
>>> dfc = df[dfb$id,]
>>
>> I get something along the lines of what I'd expect where my new
>> dataframe is a subset of the original based on the matching ids I
>> specified in dfb$id. Is that wrong in my first example?
>>
>> Cheers,
>>
>> Sean
>>
>> On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn <istazahn at gmail.com> wrote:
>>> Hi Sean,
>>> Comment in line below.
>>>
>>> On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern <sean.maceach at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I'm running into a problem subsetting a data frame that I have never
>>>> encountered before:
>>>>
>>>>> dim(chkPd)
>>>> [1] 3213    6
>>>>
>>>>> df = head(chkPd)
>>>>> df
>>>>               PN        WB      Sire     Dam   MG SEX
>>>> 601      1001  715349   61710   61702   67    F
>>>> 969  1001_1  511092 616253 615037 168    F
>>>> 986  1002_1  511082 616253 623905 168    F
>>>> 667      1003  715617   61817   61441   67    F
>>>> 1361 1003_1 510711 635246 627321 168    F
>>>> 754       1004 715272   62356   61380  67     F
>>>>
>>>>
>>>>> dfb = chkPd[df$PN,]
>>>>> dfb
>>>>            PN     WB   Sire    Dam  MG  SEX
>>>> 1001    2114_1 510944 616294 614865 168    M
>>>> NA        <NA>     NA   <NA>   <NA>  NA <NA>
>>>> NA.1      <NA>     NA   <NA>   <NA>  NA <NA>
>>>> 1003    1130_1 510950 616294 619694 168    F
>>>> NA.2      <NA>     NA   <NA>   <NA>  NA <NA>
>>>> 1004 2221-SHR2 510952 616294 619694 168    M
>>>>
>>>>
>>>> I'm not sure why I'm getting this behaviour? By sub-setting the
>>>> original data frame by PN I seem to be pulling out row numbers?
>>>> Therefore I am only getting results where PN is less than the
>>>> dimensions of the original data frame and of course nothing where PN
>>>> has _ in the id. I have also tried using subset but haven't had any
>>>> luck with that either.
>>>
>>> That is the documented behavior as far as I can tell. See
>>>
>>> ?"[.data.frame"
>>>
>>> Maybe my brain is going soft at the end of a long day, but I can't
>>> tell what you're trying to do. Can you clarify?
>>>
>>> -Ista
>>>
>>>>
>>>>
>>>>>dfb = subset(chkPd, PN==df$PN)
>>>> Warning message:
>>>> In PN == df$PN :
>>>>  longer object length is not a multiple of shorter object length
>>>>
>>>> I wasn't aware that both the larger data frame had to be a multiple of
>>>> the object you were sub-setting . In any case I would appreciate any
>>>> insight into what I may be doing wrong.
>>>>
>>>> Cheers,
>>>>
>>>> Sean
>>>>
>>>>
>>>>> sessionInfo()
>>>> R version 2.9.1 (2009-06-26)
>>>> i386-apple-darwin8.11.1
>>>>
>>>> locale:
>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Ista Zahn
>>> Graduate student
>>> University of Rochester
>>> Department of Clinical and Social Psychology
>>> http://yourpsyche.org
>>>
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>




More information about the R-help mailing list