[R] combining data.frames with is.na & match (), two questions

Thu Apr 18 10:04:31 CEST 2019

Dear Drake

See in-line comments

On 18/04/2019 00:24, Drake Gossi wrote:
> Hello everyone,
> 
> I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
> and I'm just having trouble understanding this maneuver.
> 
> In sum, I'm trying to combine data in two different data.frames.
> 
> This data.frame is called fruitNutr
> 
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
> 
> And this data.frame is called fruitData
> 
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
> 
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
> 
> fruitData$calories <- NA
> 
> 
> As a result, I've created a new column for the fruitData data.frame:
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0            N/A
> 3 pear green pear 0.5            N/A
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> Then:
> 
>> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
>> index
>    [1]    NA       1       2      NA      NA
>> is.na(index)
>    [1]    TRUE   FALSE    FALSE   TRUE    TRUE
>> fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> (index)]]
>> fruitData
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column,
> and only for what's true according to is.na(index). But I just literally
> can't understand this last part.  fruitNutr$Calories[index[!is.na(index)]]
> 
> Two questions.
> 
> 
>     1. I just literally don't understand how this code works. It does work,
>     of course, but I don't know what it's doing, specifically this [index[!
>     is.na(index)]] part. Could someone explain it to me like I'm five? I'm
>     new at this...

Decompose it from the inside out. So

!is.na(index)

gives you a vector the same length as index which is true if index has a 
value and false if it is NA

index[ something ]

gives you a vector of all the values of index corresponding to something 
being true (in this case). Note this vector may be shorter than 
something if that contains FALSE.

That should help you get started. My personal opinion is that it is much 
clearer with these things to do it in separate stages.

keep <= !is.na(index)
index[keep]

and check the value of keep if it seems to have gone wrong
>     2. And then: is there any other way to combine these two data.frames so
>     that we get this same result? maybe an easier to understand method?
> 
> That same result, again, is
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
> 
> 
> Drake
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
> 
> 

-- 
Michael
http://www.dewey.myzen.co.uk/home.html