[R] (no subject)

Douglas Bates bates at stat.wisc.edu
Sat Jul 29 17:09:14 CEST 2006


On 7/29/06, jim holtman <jholtman at gmail.com> wrote:
> Is this what you want?
>
> > set.seed(1)
> > x <- matrix(sample(c(1, NA), 100, TRUE), nrow=10) # creat some data
> > x
>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>  [1,]    1    1   NA    1   NA    1   NA    1    1     1
>  [2,]    1    1    1   NA   NA   NA    1   NA   NA     1
>  [3,]   NA   NA   NA    1   NA    1    1    1    1    NA
>  [4,]   NA    1    1    1   NA    1    1    1    1    NA
>  [5,]    1   NA    1   NA   NA    1   NA    1   NA    NA
>  [6,]   NA    1    1   NA   NA    1    1   NA    1    NA
>  [7,]   NA   NA    1   NA    1    1    1   NA   NA     1
>  [8,]   NA   NA    1    1    1   NA   NA    1    1     1
>  [9,]   NA    1   NA   NA   NA   NA    1   NA    1    NA
> [10,]    1   NA    1    1   NA    1   NA   NA    1    NA
> > # count number of NAs per row
> > numNAs <- apply(x, 1, function(z) sum(is.na(z)))

It's a minor point but on a large matrix it would be better to use

numNAs <- rowSums(is.na(z))

> > numNAs
>  [1] 3 5 5 3 6 5 5 4 7 5
> > # remove rows with more than 5 NAs
> > x[!(numNAs > 5),]
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,]    1    1   NA    1   NA    1   NA    1    1     1
> [2,]    1    1    1   NA   NA   NA    1   NA   NA     1
> [3,]   NA   NA   NA    1   NA    1    1    1    1    NA
> [4,]   NA    1    1    1   NA    1    1    1    1    NA
> [5,]   NA    1    1   NA   NA    1    1   NA    1    NA
> [6,]   NA   NA    1   NA    1    1    1   NA   NA     1
> [7,]   NA   NA    1    1    1   NA   NA    1    1     1
> [8,]    1   NA    1    1   NA    1   NA   NA    1    NA
> >
>
>
>
> On 7/28/06, John Morrow <john at emiliem.com> wrote:
> >
> > Dear R-Helpers,
> >
> > I have a large data matrix (9707 rows, 60 columns), which contains missing
> > data. The matrix looks something like this:
> >
> > 1) X X X X X X  NA  X X X X X X X X X
> >
> > 2) NA NA NA NA X NA NA NA X NA NA
> >
> > 3) NA NA X NA NA NA NA NA NA NA
> >
> > 5) NA X NA X X X NA X X X X NA X
> >
> > ..
> >
> > 9708) X NA NA X NA NA X X NA NA X
> >
> > .and so on. Notice that every row has a varying number of entries, all
> > rows
> > have at least one entry, but some rows have too much missing data.  My
> > goal
> > is to filter out/remove rows that have ~5 (this number is yet to be
> > determined, but let's say its 5) missing entries before I run pearsons to
> > tell me correlation between all of the rows.  The order of the columns
> > does
> > not matter here.
> > I think that I might need to test each row for a "data, at least one NA,
> > data" pattern?
> >
> > Is there some kind of way of doing this? I am at a loss for an easy way to
> > accomplishing this. Any suggestions are most appreciated!
> >
> > John Morrow
> >
> >
> >
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list