[R] na.omit not omitting rows

William Dunlap wdun|@p @end|ng |rom t|bco@com
Thu Jun 4 21:38:45 CEST 2020


Does droplevels() help?

> d <- data.frame(size = factor(c("S","M","M","L","L"),
levels=c("S","M","L")), id=c(101,NA,NA,104,105))
> str(d)
'data.frame':   5 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
 $ id  : num  101 NA NA 104 105
> str(na.omit(d))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 3 3
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"
> str(droplevels(na.omit(d)))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 2 levels "S","L": 1 2 2
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <
Theodore.Stankowich using csulb.edu> wrote:

> Hello! I'm trying to create a subset of a dataset and then remove all rows
> with NAs in them. Ultimately, I am running phylogenetic analyses with trees
> that require the tree tiplabels to match exactly with the rows in the
> dataframe. But when I use na.omit to delete the rows with NAs, there is
> still a trace of those omitted rows in the data.frame, which then causes an
> error in the phylogenetic analyses. Is there any way to completely scrub
> those omitted rows from the dataframe? The code is below. As you can see
> from the result of the final str(Protect1) line, there are attributes with
> the omitted features still in the dataframe (356 species names in the
> UphamComplBinomial factor, but only 319 observations). These traces are
> causing errors with the phylo analyses.
>
> > Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep,
> Shade))  #Create the dataframe with variables of interest from an attached
> dataset
> > row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as
> rownames
> > Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data
> > str(Protect1)
> 'data.frame': 319 obs. of  4 variables:
>  $ UphamComplBinomial: Factor w/ 356 levels
> "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10
> 11 12 ...
>  $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
>  $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
>  $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53
> 17 49 52 52 39 39 41 ...
>  - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
>   ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES"
> "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES"
> "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...
>
> Dr. Ted Stankowich
> Associate Professor
> Department of Biological Sciences
> California State University Long Beach
> Long Beach, CA 90840
> theodore.stankowich using csulb.edu<mailto:theodore.stankowich using csulb.edu>
> 562-985-4826
> http://www.csulb.edu/mammal-lab/
> @CSULBMammalLab
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list