[R] apply with multiple conditions

Rui Barradas ruipbarradas at sapo.pt
Mon Jul 2 21:24:43 CEST 2012


Hello,

Sorry to intrude, but I think it's a factor issue.
Try the changing the disjunction to, (in multiline edit)


new.bin <- is.na(prev.chrom) |
		df$chrom != levels(df$chrom)[prev.chrom] |
		delta.start >= 115341

It should work, now.

Hope this helps,

Rui Barradas

Em 02-07-2012 20:03, pguilha escreveu:
> Jean,
> It's crazy, I'm still getting 1,2,3,4,5,6 in the bin column.....
> Also (this is an unrelated problem i think), unless I've misunderstood
> it, I think your code will only create a new bin if the difference
> between chromStart at i and i-1 position is >=115341....What I want is
> for a new bin to be created each time the difference between
> chromStart at i and i-j is >=115341, where 'i-j' corresponds to the
> first row of the last bin....Im not sure if I'm being
> clear...chromStart values correspond to coordinates along a chromosome
> so I want to basically cut up each chromosome into sections/bins of
> approximately 115341...
>
> thanks again for all your efforts with this, they're much appreciated!
> Paul
>
> On 2 July 2012 19:36, Jean V Adams [via R]
> <ml-node+s789695n4635185h87 at n4.nabble.com> wrote:
>> Paul,
>>
>> Try this (I changed some of the object names, but the meat of the code is
>> the same):
>>
>> df <- data.frame(
>>          chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
>>          chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
>>          chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
>>          name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)",
>> "ZBTB33", "CTCF"),
>>          cumsum = c(10089, 20221, 30354, 40502, 50884, 67016)
>>          )
>>
>> # assign a new bin every time chrom changes and every time chromStart
>> changes by 115341 or more
>> L <- nrow(df)
>> prev.chrom <- c(NA, df$chrom[-L])
>> delta.start <- c(NA, df$chromStart[-1] - df$chromStart[-L])
>> new.bin <- is.na(prev.chrom) | df$chrom != prev.chrom | delta.start >=
>> 115341
>> df$bin <- cumsum(new.bin)
>> df
>>
>>
>> pguilha <[hidden email]> wrote on 07/02/2012 10:23:36 AM:
>>
>>> Jean, that's exactly what it should be, but yes I copied and pasted
>>> from your email so I don't see how I could have introduced an error in
>>> there....
>>> paul
>>>
>>> On 2 July 2012 15:57, Jean V Adams [via R]
>>> <[hidden email]> wrote:
>>>> Paul,
>>>>
>>>> Are you submitting the exact code that I included in my previous
>> e-mail?
>>
>>>> When I submit that code, I get this ...
>>>>
>>>>    chrom chromStart chromEnd         name cumsum bin
>>>> 1  chr1      10089    10309       ZBTB33  10089   1
>>>> 2  chr1      10132    10536  TAF7_(SQ-8)  20221   1
>>>> 3  chr2      10133    10362     Pol2-4H8  30354   2
>>>> 4  chr2      10148    10418 MafF_(M8194)  40502   2
>>>> 5  chr2     210382   210578       ZBTB33  50884   3
>>>> 6  chr2     216132   216352         CTCF  67016   3
>>>>
>>>> Jean
>>>>
>>>>
>>>> Paul Guilhamon <[hidden email]> wrote on 07/02/2012 08:59:00 AM:
>>>>
>>>>> Thanks for your reply Jean,
>>>>>
>>>>> I think your interpretation is correct but when I run your code I end
>>>>> up with the below dataframe and obviously the bins created there
>> don't
>>
>>>>> correspond to a chromStart change of 115341:
>>>>>
>>>>>    chrom chromStart chromEnd         name cumsum bin
>>>>> 1  chr1      10089    10309       ZBTB33  10089   1
>>>>> 2  chr1      10132    10536  TAF7_(SQ-8)  20221   2
>>>>> 3  chr2      10133    10362     Pol2-4H8  30354   3
>>>>> 4  chr2      10148    10418 MafF_(M8194)  40502   4
>>>>> 5  chr2     210382   210578       ZBTB33  50884   5
>>>>> 6  chr2     216132   216352         CTCF  67016   6
>>>>>
>>>>> the first two rows should have the same bin number (same chrom,
>>>>> <115341 diff), then rows 3&4 should be in another bin (different
>> chrom
>>
>>>>> from rows 1&2, <115341 diff), and rows 5&6 in another one (same chrom
>>>>> but >115341 difference between row 4 and row 5).
>>>>>
>>>>> it seems the new.bin line of your code isn't quite doing what it
>>>>> should but I can't pinpoint the error there...
>>>>> Paul
>>>>>
>>>>>
>>>>> On 2 July 2012 14:19, Jean V Adams <[hidden email]> wrote:
>>>>>> Paul,
>>>>>>
>>>>>> My interpretation is that you are trying to assign a new bin number
>> to
>>
>>>> a row
>>>>>> every time the variable chrom changes and every time the variable
>>>> chromStart
>>>>>> changes by 115341 or more.  Is that right?  If so, you don't need a
>>>> loop at
>>>>>> all.  Check out the code below.  I made a couple changes to the
>>>> all.tf7
>>>>>> example data frame so that it would have two changes in bin number,
>>>> one
>>>>
>>>>>> based on the chrom variable and one based on the chromStart
>> variable.
>>>>>>
>>>>>> Jean
>>>>>>
>>>>>> all.tf7 <- data.frame(
>>>>>>          chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
>>>>>>          chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
>>>>>>          chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
>>>>>>          name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8",
>> "MafF_(M8194)",
>>>>>> "ZBTB33", "CTCF"),
>>>>>>          cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
>>>>>>          bin = rep(NA, 6)
>>>>>>          )
>>>>>>
>>>>>> # assign a new bin every time chrom changes and every time
>> chromStart
>>>>>> changes by 115341 or more
>>>>>> L <- nrow(all.tf7)
>>>>>> prev.chrom <- c(NA, all.tf7$chrom[-L])
>>>>>> delta.start <- c(NA, all.tf7$chromStart[-1] -
>> all.tf7$chromStart[-L])
>>
>>>>>> new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom |
>>>> delta.start >=
>>>>
>>>>>> 115341
>>>>>> all.tf7$bin <- cumsum(new.bin)
>>>>>> all.tf7
>>>>>>
>>>>>>
>>>>>> pguilha <[hidden email]> wrote on 07/02/2012 06:25:13 AM:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> I have written a for loop to act on a dataframe with close to
>>>> 3million
>>>>>>> rows
>>>>>>> and 6 columns and I would like to pass it to apply() to speed the
>>>> process
>>>>>>> up
>>>>>>> (I let the loop run for 2 days before stopping it and it had only
>>>> gone
>>>>>>> through 200,000 rows) but I am really struggling to find a way to
>>>> pass the
>>>>>>> arguments. Below are the loop and the head of the dataframe I am
>>>> working
>>>>>>> on.
>>>>>>> Any hints would be much appreciated, thank you! (I have searched
>> for
>>
>>>> this
>>>>
>>>>>>> but could not find any other posts doing quite what I want)
>>>>>>> Paul
>>>>>>>
>>>>>>> x<-as.numeric(all.tf7[1,2])
>>>>>>> for (i in 2:nrow(all.tf7)) {
>>>>>>>    if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
>>>>>>> all.tf7[i,6]<-all.tf7[i-1,6]
>>>>>>>    else if (all.tf7[i,1]==all.tf7[i-1,1] &
>> (all.tf7[i,2]-x)>=115341) {
>>>>>>>      all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>>>>>>>      x<-as.numeric(all.tf7[i,2]) }
>>>>>>>    else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
>>>>>>>      all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>>>>>>>      x<-as.numeric(all.tf7[i,2]) }
>>>>>>> }
>>>>>>>
>>>>>>> #the aim here is to attribute a bin number to each row so that I
>> can
>>
>>>> then
>>>>
>>>>>>> split the dataframe according to those bins.
>>>>>>>
>>>>>>>
>>>>>>> chrom chromStart chromEnd         name cumsum bin
>>>>>>> chr1      10089             10309               ZBTB33  10089   1
>>>>>>> chr1      10132             10536      TAF7_(SQ-8)  20221   1
>>>>>>> chr1      10133             10362            Pol2-4H8  30354   1
>>>>>>> chr1      10148             10418  MafF_(M8194)  40502   1
>>>>>>> chr1      10382             10578                ZBTB33  50884   1
>>>>>>> chr1      16132             16352                    CTCF  67016 1
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635185.html
>> To unsubscribe from apply with multiple conditions, click here.
>> NAML
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635189.html
> Sent from the R help mailing list archive at Nabble.com.
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list