[R] combining data.frames with is.na & match (), two questions

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Thu Apr 18 10:31:41 CEST 2019


Hi

I wonder why such combination is so complicated in your text book.

Having data frames fr1 and fr2

> dput(fr1)
structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana",
"mango", "pear"), class = "factor"), Calories = c(100L, 100L,
200L)), class = "data.frame", row.names = c("1", "2", "3"))
> dput(fr2)
structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple",
"banana", "kiwi", "orange", "pear"), class = "factor"), Color = structure(c(3L,
4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow"
), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label = c("oblong",
"pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1,
0)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5"))
>

> fr1
   Fruit Calories
1 banana      100
2   pear      100
3  mango      200
>

you can use merge to combine those 2 data frames to get either all values from both

> merge(fr2, fr1, all=T)
   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100
6  mango   <NA>   <NA>    NA      200

just values from data frame with calories

> merge(fr2, fr1, all.y=T)
   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0      100
2   pear  green   pear   0.5      100
3  mango   <NA>   <NA>    NA      200

or just values from data frame with colours

> merge(fr2, fr1, all.x=T)
   Fruit  Color  Shape Juice Calories
1  apple    red  round   1.0       NA
2 banana yellow oblong   0.0      100
3   kiwi  green  round   0.0       NA
4 orange orange  round   1.0       NA
5   pear  green   pear   0.5      100

Cheers
Petr


> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Drake Gossi
> Sent: Thursday, April 18, 2019 1:24 AM
> To: r-help using r-project.org
> Subject: [R] combining data.frames with is.na & match (), two questions
>
> Hello everyone,
>
> I'm working through this book, *Humanities Data in R* (Arnold & Tilton), and
> I'm just having trouble understanding this maneuver.
>
> In sum, I'm trying to combine data in two different data.frames.
>
> This data.frame is called fruitNutr
>
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
>
> And this data.frame is called fruitData
>
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
>
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
>
> fruitData$calories <- NA
>
>
> As a result, I've created a new column for the fruitData data.frame:
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0            N/A
> 3 pear green pear 0.5            N/A
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
> Then:
>
> > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index
>   [1]    NA       1       2      NA      NA
> > is.na(index)
>   [1]    TRUE   FALSE    FALSE   TRUE    TRUE
> > fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> (index)]]
> > fruitData
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column, and only
> for what's true according to is.na(index). But I just literally can't understand
> this last part.  fruitNutr$Calories[index[!is.na(index)]]
>
> Two questions.
>
>
>    1. I just literally don't understand how this code works. It does work,
>    of course, but I don't know what it's doing, specifically this [index[!
>    is.na(index)]] part. Could someone explain it to me like I'm five? I'm
>    new at this...
>    2. And then: is there any other way to combine these two data.frames so
>    that we get this same result? maybe an easier to understand method?
>
> That same result, again, is
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1            N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1            N/A
> 5 kiwi green round 0            N/A
>
>
> Drake
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/



More information about the R-help mailing list