[R] set dataframe field value from lookup table

David Winsemius dwinsemius at comcast.net
Thu Dec 9 18:17:44 CET 2010


Offlist comments  No reply needed .. This is just for emphasis and  
clarification.

On Dec 9, 2010, at 11:19 AM, Jon Erik Ween wrote:

> David
>
> I see how findInterval is a more elegant way of doing 1). I'd need  
> to change the indices in the lookup table, as
>> findInterval(36, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,  
>> 84, 89) )
> [1] 6
>
> should be 7, not 6. The age range for the 7th column 35-44. But  
> that's easy.

As you say, changing the output to agree with your expectations is  
"easy" but to be clear, R _is_ delivering the correct response to the  
question "which interval is 36 located in ... the 6th. Any ambiguity  
is due to your not formulating a good question (and my errors).

>
> I can't see how findInterval will help me for 2), though.

"2)" was never very clear. I do think findInterval must be what is  
needed, but I am repeating my call for you to post a full example and  
more complete explanation to the list.

> The standard score is integer and not a range.

Which is _not_ how statisticians usually think of a "z score". So it  
may need some further background or use of less misleading  
terminology. You are probably tasked with using a table handed to you  
that at one time was a "z-score" for <something> but has been recast  
in tabular form.

-- 
David.

> So it maps 1 to 1. The real problem, though, is setting the value in  
> the main dataframe (df) with the value from the lookup table based  
> on the identified age and score indices.
>
> My initial guess was:
>
> df$DSTz <-DSTzlook[which(DSTzlook[,1]==df 
> $Agetmp),which(DSTzlook[1,]==df$DSF+df$DSB)]
>
> which could be rewritten:
>
> df$DSTz <-DSTzlook[which(DSTzlook[,1]== findInterval(df$Age, c(0,  
> 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79, 84,  
> 89))),which(DSTzlook[1,]==df$DSF+df$DSB)]
>
> But it is the indirect referencing of the lookup in the main table  
> that causes me trouble.
>
> Jon
>
> Soli Deo Gloria
>
> Jon Erik Ween, MD, MS
> Scientist, Kunin-Lunenfeld Applied Research Unit
> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
> Assistant Professor, Dept. of Medicine, Div. of Neurology
>    University of Toronto Faculty of Medicine
>
> Kimel Family Building, 6th Floor, Room 644
> Baycrest Centre
> 3560 Bathurst Street
> Toronto, Ontario M6A 2E1
> Canada
>
> Phone: 416-785-2500 x3648
> Fax: 416-785-2484
> Email: jween at klaru-baycrest.on.ca
>
>
> Confidential: This communication and any attachment(s) may contain  
> confidential or privileged information and is intended solely for  
> the address(es) or the entity representing the recipient(s). If you  
> have received this information in error, you are hereby advised to  
> destroy the document and any attachment(s), make no copies of same  
> and inform the sender immediately of the error. Any unauthorized use  
> or disclosure of this information is strictly prohibited.
>
>
>
> On 2010-12-09, at 11:06 AM, David Winsemius wrote:
>
>>
>> On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:
>>
>>> Thanks David
>>>
>>> What I am trying to do is set up a script that assigns z-scores to  
>>> a large dataframe (2500x300, but has Age in years and test scores  
>>> as columns.) from a published table of age-corrected standard  
>>> scores on this cognitive test.
>>>
>>> 1) The age intervals in the lookup table are given and not my  
>>> choice.
>>
>> You may want to skip the intermediate translation to the row and  
>> column labels and just use the results of findInterval:
>>
>>> findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,  
>>> 84, 89) )
>> [1] 1
>>> findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,  
>>> 84, 89) )
>> [1] 14
>>
>> Those look like appropriate indices for the column argument
>>>
>>> 2) Sorry I didn't post an example table, it looks something like  
>>> this ("Age" is in the first row, standard scores in the first  
>>> column):
>>>
>>>      17   19   24   29   34   44 ....
>>> 30   2.6  2.6  2.6  2.6  2.6  2.6
>>> 29  1.8  1.8  1.8  2.0  2.6  2.6
>>> 28  1.0  1.0  1.8  1.8  2.6  2.6
>>> 27  0.0  0.5  1.0  1.8  2.6  2.6
>>> 26   -.5  0.0  0.0  1.0  1.8  2.6
>>> .
>>> .
>>> .
>>> .
>>>
>>> So, if a subject (row) has age==29 and a standard score of 28, the  
>>> value should be 1.8, etc.
>>
>> Looks like a job for two findInterval indices to be used used with  
>> "[ r , c ] ".
>>
>> -- 
>> David.
>>
>>>
>>> Thanks
>>>
>>>
>>> Jon
>>>
>>> Soli Deo Gloria
>>>
>>> Jon Erik Ween, MD, MS
>>> Scientist, Kunin-Lunenfeld Applied Research Unit
>>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>>>  University of Toronto Faculty of Medicine
>>>
>>> Kimel Family Building, 6th Floor, Room 644
>>> Baycrest Centre
>>> 3560 Bathurst Street
>>> Toronto, Ontario M6A 2E1
>>> Canada
>>>
>>> Phone: 416-785-2500 x3648
>>> Fax: 416-785-2484
>>> Email: jween at klaru-baycrest.on.ca
>>>
>>>
>>> Confidential: This communication and any attachment(s) may contain  
>>> confidential or privileged information and is intended solely for  
>>> the address(es) or the entity representing the recipient(s). If  
>>> you have received this information in error, you are hereby  
>>> advised to destroy the document and any attachment(s), make no  
>>> copies of same and inform the sender immediately of the error. Any  
>>> unauthorized use or disclosure of this information is strictly  
>>> prohibited.
>>>
>>>
>>>
>>> On 2010-12-09, at 10:33 AM, David Winsemius wrote:
>>>
>>>>
>>>> On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:
>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>> This is (hopefully) a bit more cogent phrasing of a previous  
>>>>> post. I'm
>>>>> trying to compute a z-score to rows in a large dataframe based  
>>>>> on values in
>>>>> another dataframe. Here's the script (that does not work). 2  
>>>>> questons,
>>>>>
>>>>> 1) Anyone know of a more elegant way to calculate the "rounded"  
>>>>> age value
>>>>> than the nested ifelse's I've used?
>>>>>
>>>>> 2) how to reference the lookup table based on computed indices?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Jon
>>>>>
>>>>> # Define tables
>>>>> DSTzlook <-
>>>>> read.table("/Users/jween/Documents/ResearchProjects/ABC/data/ 
>>>>> DSTz.txt",
>>>>> header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
>>>>> df<-stroke
>>>>>
>>>>> # Compute rounded age.
>>>>> df$Agetmp
>>>>> <-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df 
>>>>> $Age>=79,79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df 
>>>>> $Age>=64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df 
>>>>> $Age>=34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df 
>>>>> $Age>=19,19,17))))))))))))
>>>>
>>>> Ew, painful. If you want categorized ages (since what the above  
>>>> coding is producing is not "rounded" in any sense of that word as  
>>>> I understand it, then why not findInterval() as an index into the  
>>>> ages you wnat to label these case with?
>>>>
>>>> df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[  # note  
>>>> Extract operation
>>>>          findInterval(runif(100,0,100),  
>>>> c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
>>>>         ]  # close extraction
>>>>
>>>>
>>>> The other option, of course, and a more "honest" one in this  
>>>> instance would be
>>>>
>>>> cut(vec, breaks=c(...), labels=c(...) )
>>>>
>>>> (It's not clear why you are not picking midpoint ages within  
>>>> those brackets to me.)
>>>>
>>>>>
>>>>> # Reference the lookup table based on computed indices
>>>>> df$DSTz
>>>>> <-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==df 
>>>>> $DSF+df$DSB)]
>>>>
>>>> I have not been able to figure out what you are trying to do  
>>>> here. Trying to use a 2d lookup looks promising a a way to  
>>>> emulate what an Excel user might attempt, but an example (as  
>>>> requested in the message at the bottom of every posting) would  
>>>> really be of great help in making this more concrete for those of  
>>>> us with insufficient abstractive abilities.
>>>>
>>>> -- 
>>>> David.
>>>>
>>>>>
>>>>> # Cleanup
>>>>> #rm(df)
>>>>> #df$Agetmp<-NULL
>>>>> -- 
>>>>> View this message in context: http://r.789695.n4.nabble.com/set-dataframe-field-value-from-lookup-table-tp3080245p3080245.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list