[R] Correct subsetting in R

peter dalgaard pdalgd at gmail.com
Thu Nov 2 12:08:31 CET 2017


> On 1 Nov 2017, at 18:03 , Elahe chalabi via R-help <r-help at r-project.org> wrote:
> 
> But they row.names() cannot give me the IDs
> 

Is "training" extracted from "data" using standard data frame indexing? If so, data[row.names(training), "ID"] should give you the relevant values. 

If not, then you are in trouble because you cannot tell the difference between two IDs that have identical responses in columns 2:608. You might proceed with something like 

signature1 <- do.call("paste", data)
any(duplicated(signature1)) # if TRUE you're not quite happy because two or more IDs are indistinguishable.

signature2 <- do.call("paste", data)
m <- match(signature2, signature1)

any(duplicated(m)) # ouch if TRUE... will require more thought

any(is.na(m)) # even more ouch, if TRUE...

data$ID[m]


-pd

> 
> 
> 
> 
> 
> On Wednesday, November 1, 2017 9:45 AM, David Wolfskill <r at catwhisker.org> wrote:
> 
> 
> 
> On Wed, Nov 01, 2017 at 04:13:42PM +0000, Elahe chalabi via R-help wrote:
> 
>> Hi all,
>> I have two data frames that one of them does not have the column ID:
>> 
>>> str(data)
>>    'data.frame':    499 obs. of  608 variables:
>>    $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
>>    $ alright      : int  1 0 0 0 0 0 0 1 2 1 ...
>>    $ bad          : int  1 0 0 0 0 0 0 0 0 0 ...
>>    $ boy          : int  1 2 1 1 0 2 2 4 2 1 ...
>>    $ cooki        : int  1 2 2 1 0 1 1 4 2 3 ...
>>    $ curtain      : int  1 0 0 0 0 2 0 2 0 0 ...
>>    $ dish         : int  2 1 0 1 0 0 1 2 2 2 ...
>>    $ doesnt       : int  1 0 0 0 0 0 0 0 1 0 ...
>>    $ dont         : int  2 1 4 2 0 0 2 1 2 0 ...
>>    $ fall         : int  3 1 0 0 1 0 1 2 3 2 ...
>>    $ fell         : int  1 0 0 0 0 0 0 0 0 0 ...
>> 
>> and the other one is:
>> 
>>> str(training)
>>    'data.frame':    375 obs. of  607 variables:
>>    $ alright      : num  1 0 0 0 1 2 1 0 0 0 ...
>>    $ bad          : num  1 0 0 0 0 0 0 0 0 0 ...
>>    $ boy          : num  1 1 2 2 4 2 1 0 1 0 ...
>>    $ cooki        : num  1 1 1 1 4 2 3 1 2 2 ...
>>    $ curtain      : num  1 0 2 0 2 0 0 0 0 0 ...
>>    $ dish         : num  2 1 0 1 2 2 2 1 4 1 ...
>>    $ doesnt       : num  1 0 0 0 0 1 0 0 0 0 ...
>>    $ dont         : num  2 2 0 2 1 2 0 0 1 0 ...
>>    $ fall         : num  3 0 0 1 2 3 2 0 2 0 ...
>>    $ fell         : num  1 0 0 0 0 0 0 0 0 0 ...
>> Does anyone know how should I get the IDs of training from data?
>> thanks for any help!
>> Elahe
>> ....
> 
> row.names() appears to be what is wanted.
> 
> Peace,
> david
> -- 
> David H. Wolfskill                r at catwhisker.org
> Unsubstantiated claims of "Fake News" are evidence that the claimant lies again.
> 
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list