[R] How to get row numbers of a subset of rows

jim holtman jholtman at gmail.com
Wed Nov 14 18:11:36 CET 2007


That works for the specific value of '1', but you would have to repeat
it for other values in the column.  If you had 100 different ranges in
that column, what would you do?  Here is another solution using
'range' on the same data:

> tapply(seq_len(nrow(x)), x$Chromosome, range)
$`1`
[1] 1 6

$`2`
[1]  7 10


On Nov 14, 2007 12:04 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> Am I missing something? ...
>
> Why not: range(seq(nrow(B))[B[,2]==1] ) ?? ## note: "==" not "="
>
> Alternatively, and easily generalized (to start with a frame which is a
> subset of the original and any subset of rows, contiguous or not)
>
> range(as.numeric(row.names(B)[B[,2]==1]))
>
> Again, am I missing something that makes this "obvious" solution impossible?
> (Wouldn't be the first time.)
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of jim holtman
> Sent: Wednesday, November 14, 2007 8:39 AM
> To: affy snp
> Cc: r-help at r-project.org
> Subject: Re: [R] How to get row numbers of a subset of rows
>
> Here is a way of doing it using 'rle':
>
> > x <- read.table(textConnection("     SNP                Chromosome
> PhysicalPosition
> + 1 SNP_A-1909444          1           7924293
> + 2 SNP_A-2237149          1           8173763
> + 3 SNP_A-4303947          1           8191853
> + 4 SNP_A-2236359          1           8323433
> + 5 SNP_A-2205441          1           8393263
> + 6 SNP_A-1909445          1           7924293
> + 7 SNP_A-2237146          2           8173763
> + 8 SNP_A-4303946          2           8191853
> + 9 SNP_A-2236357          2           8323433
> + 10 SNP_A-2205442         2           8393263"), header=TRUE)
> > # use rle to get the 'runs'
> > y <- rle(x$Chromosome)
> > # create dataframe with start/ends and values
> > start <- head(cumsum(c(1, y$lengths)), -1)
> > index <- data.frame(values=y$values, start=start, end=start + y$lengths -
> 1)
> >
> > index
>  values start end
> 1      1     1   6
> 2      2     7  10
> >
>
>
> On Nov 14, 2007 10:56 AM, affy snp <affysnp at gmail.com> wrote:
> > Hello list,
> >
> > I read in a txt file using
> >
> > <B<-read.table(file="data.snp",header=TRUE,row.names=NULL)
> >
> > by specifying the row.names=NULL so that the rows are numbered.
> > Below is an example after how the table looks like using
> > <B[1:10,1:3]
> >
> >
> >      SNP                Chromosome  PhysicalPosition
> > 1 SNP_A-1909444          1           7924293
> > 2 SNP_A-2237149          1           8173763
> > 3 SNP_A-4303947          1           8191853
> > 4 SNP_A-2236359          1           8323433
> > 5 SNP_A-2205441          1           8393263
> > 6 SNP_A-1909445          1           7924293
> > 7 SNP_A-2237146          2           8173763
> > 8 SNP_A-4303946          2           8191853
> > 9 SNP_A-2236357          2           8323433
> > 10 SNP_A-2205442         2           8393263
> >
> > I am wondering if there is a way to return the start and end row numbers
> > for a subset of rows.
> >
> > For example, If I specify B[,2]=1, I would like to get
> > start=1 and end=6
> >
> > if B[,2]=2, then start=7 and end=10
> >
> > Is there any way in R to quickly do this?
> >
> > Thanks a bunch!
> >
> > Allen
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list