[R] New Variable from Several Existing Variables

David Winsemius dwinsemius at comcast.net
Sat Feb 27 02:43:57 CET 2010


And if your data is in a dataframe (... please include an example of  
the results of str() next time...) :
 > dfrm <- rd.txt("Column1, Column2, Column3
+ Yes,Yes,Yes
+ Yes,No,Yes
+ No,No,No
+ No,Yes,No
+ Yes,Yes,No", sep=",")  #rd.txt is just a wrapper I use for  
read.table(textConnection( ), header=TRUE, ... )

 > dfrm$newvar <- apply(subset(dfrm, select=c(Column1, Column2,  
Column3)), 1,
+                         function(x) { if (all(x=="Yes")) {"Yes"}  
else {"No"} } )
 > dfrm
   Column1 Column2 Column3 newvar
1     Yes     Yes     Yes    Yes
2     Yes      No     Yes     No
3      No      No      No     No
4      No     Yes      No     No
5     Yes     Yes      No     No

Notice that I created this variable in a manner that did not require  
the use of every column of the dataframe.

-- 
David


On Feb 26, 2010, at 7:57 PM, Don MacQueen wrote:

> If your data is in a matrix named "orgdata" :
>
> newvar <- apply(orgdata , 1, function(arow, if (all(arow=='Yes'))  
> 'Yes' else 'No'

Yes, at least 2 missing parens and an unneeded comma, perhaps:

newvar <- apply(orgdata , 1, function(arow) if (all(arow=='Yes'))  
'Yes' else 'No' )

>
> newdata <- cbind(orgdata, newvar)
>
> finaloutcome <- newdata[ newvar=='Yes',]
>
>
> The key to this is the apply() function.
>
> I might have missed some parentheses...
>
> There are other ways; this is just one. I might think of a simpler  
> one if I gave it more time...
>
> -Don
>
> At 4:40 PM -0800 2/26/10, wookie1976 wrote:
>> I am new to R, but have been using SAS for years.  In this  
>> transition period,
>> I am finding myself pulling my hair out to do some of the simplest  
>> things.
>> An example of this is that I need to generate a new variable based  
>> on the
>> outcome of several existing variables in a data row.  In other  
>> words, if the
>> variable in all three existing columns are "Yes", then then the new  
>> variable
>> should also be "Yes", however if any one of the three existing  
>> variables is
>> a "No", then then new variable should be a "No".  I would then use  
>> that new
>> variable as an exclusion for data in a new or existing dataset  
>> (i.e., if
>> NewVariable = "No" then delete):
>>
>> Take this:
>> Column1, Column2, Column3
>> Yes, Yes, Yes
>> Yes, No, Yes
>> No, No, No
>> No, Yes, No
>> Yes, Yes, No
>>
>> Generate this:
>> Column1, Column2, Column3, NewVariable1
>> Yes, Yes, Yes, Yes
>> Yes, No, Yes, No
>> No, No, No, No
>> No, Yes, No, No
>> Yes, Yes, No, No
>>
>> And end up with this:
>> Column1, Column2, Column3, NewVariable1
>> Yes, Yes, Yes, Yes
>>
>> Any suggestions on how to efficiently do this in either the  
>> existing or a
>> new dataset?
>>

You might have simplified this a bit if you let the columns be logical  
rather than character.
 > dfrm$newvar <- apply(subset(dfrm, select=c(Column1, Column2,  
Column3)), 1,
+                         function(x) {  (all(x=="Yes"))  } )
 > dfrm
   Column1 Column2 Column3 newvar
1     Yes     Yes     Yes   TRUE
2     Yes      No     Yes  FALSE
3      No      No      No  FALSE
4      No     Yes      No  FALSE
5     Yes     Yes      No  FALSE

You would then be able to apply more simple tests with operators and  
functions that accept the logical data type.

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list