[R] apply with multiple conditions

Paul Guilhamon paul.guilhamon at gmail.com
Mon Jul 2 15:59:00 CEST 2012


Thanks for your reply Jean,

I think your interpretation is correct but when I run your code I end
up with the below dataframe and obviously the bins created there don't
correspond to a chromStart change of 115341:

  chrom chromStart chromEnd         name cumsum bin
1  chr1      10089    10309       ZBTB33  10089   1
2  chr1      10132    10536  TAF7_(SQ-8)  20221   2
3  chr2      10133    10362     Pol2-4H8  30354   3
4  chr2      10148    10418 MafF_(M8194)  40502   4
5  chr2     210382   210578       ZBTB33  50884   5
6  chr2     216132   216352         CTCF  67016   6

the first two rows should have the same bin number (same chrom,
<115341 diff), then rows 3&4 should be in another bin (different chrom
from rows 1&2, <115341 diff), and rows 5&6 in another one (same chrom
but >115341 difference between row 4 and row 5).

it seems the new.bin line of your code isn't quite doing what it
should but I can't pinpoint the error there...
Paul


On 2 July 2012 14:19, Jean V Adams <jvadams at usgs.gov> wrote:
> Paul,
>
> My interpretation is that you are trying to assign a new bin number to a row
> every time the variable chrom changes and every time the variable chromStart
> changes by 115341 or more.  Is that right?  If so, you don't need a loop at
> all.  Check out the code below.  I made a couple changes to the all.tf7
> example data frame so that it would have two changes in bin number, one
> based on the chrom variable and one based on the chromStart variable.
>
> Jean
>
> all.tf7 <- data.frame(
>         chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
>         chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
>         chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
>         name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)",
> "ZBTB33", "CTCF"),
>         cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
>         bin = rep(NA, 6)
>         )
>
> # assign a new bin every time chrom changes and every time chromStart
> changes by 115341 or more
> L <- nrow(all.tf7)
> prev.chrom <- c(NA, all.tf7$chrom[-L])
> delta.start <- c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L])
> new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start >=
> 115341
> all.tf7$bin <- cumsum(new.bin)
> all.tf7
>
>
> pguilha <paul.guilhamon at gmail.com> wrote on 07/02/2012 06:25:13 AM:
>
>> Hello all,
>>
>> I have written a for loop to act on a dataframe with close to 3million
>> rows
>> and 6 columns and I would like to pass it to apply() to speed the process
>> up
>> (I let the loop run for 2 days before stopping it and it had only gone
>> through 200,000 rows) but I am really struggling to find a way to pass the
>> arguments. Below are the loop and the head of the dataframe I am working
>> on.
>> Any hints would be much appreciated, thank you! (I have searched for this
>> but could not find any other posts doing quite what I want)
>> Paul
>>
>> x<-as.numeric(all.tf7[1,2])
>> for (i in 2:nrow(all.tf7)) {
>>   if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
>> all.tf7[i,6]<-all.tf7[i-1,6]
>>   else if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)>=115341) {
>>     all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>>     x<-as.numeric(all.tf7[i,2]) }
>>   else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
>>     all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>>     x<-as.numeric(all.tf7[i,2]) }
>> }
>>
>> #the aim here is to attribute a bin number to each row so that I can then
>> split the dataframe according to those bins.
>>
>>
>> chrom chromStart chromEnd         name cumsum bin
>> chr1      10089             10309               ZBTB33  10089   1
>> chr1      10132             10536      TAF7_(SQ-8)  20221   1
>> chr1      10133             10362            Pol2-4H8  30354   1
>> chr1      10148             10418  MafF_(M8194)  40502   1
>> chr1      10382             10578                ZBTB33  50884   1
>> chr1      16132             16352                    CTCF  67016   1



More information about the R-help mailing list