[R] summing up a column.

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Jun 14 05:48:04 CEST 2016


You did half of what Petr asked... but your email still looks unreadable
because you did not send it as text only. Look at [1] to see what we see, 
and why we want you to send plain text. What you see is not what we see.

This is an outer join... an inherently inefficient operation according to 
relational database theory. Most solutions to this type of problem are 
likely to be slow, but minimizing unnecessary use of memory can help and 
to do that you can overwrite existing values instead of successively 
concatenating longer vectors of results as you go.

To understand the solution below, you should execute individual 
expressions interactively as you step through the code to see what the 
intermediate values look like. In particular, expressions on the right 
side of assignments can be interactively entered at the console without 
changing the variables in the environment so do it as much as you need to 
in order to see what is happening.

# alignment not required, but HTML run-together lines not wanted
A <- structure( list( posA = c( 1L, 2L, 5L, 4L, 9L)
                     , posB = c( 9L, 7L, 12L, 7L, 13L)
                     )
               , .Names = c("posA", "posB")
               , class = "data.frame"
               , row.names = c(NA, -5L)
               )
B <- structure( list( pos = c( 4L, 2L, 7L, 1L, 13L, 12L, 9L)
                     , a = c(0.4, 0.1, 0.5, 0.4, 0.1, 0.2, 0.3)
                     , b = c(7L, 5L, 8L, 1L, 6L, 11L, 12L)
                     , c = c(0.8, 0.4, 0.32, 0.1, 0.13, 0.01, 0.23))
               , .Names = c("pos", "a", "b", "c")
               , class = "data.frame"
               , row.names = c(NA, -7L)
               )
# sort B
sB <- B[ order( B$pos ), ]
# performance: set aside memory to remember results
A$count07 <- NA
A$mina <- NA
for ( i in seq.int( nrow( A ) ) ) {
   # logical indexing vector
   idx <- A[ i, "posA" ] <= sB$pos & sB$pos <= A[ i, "posB" ]
   # only extract desired vector once
   a <- sB[ idx, "a" ]
   # sum adds logical values as if TRUE=1
   A[ i, "count07" ] <- sum( cumsum( a ) < 0.7 )
   A[ i, "mina" ] <-min( a )
}
print( A )

#=== sample interactive session for study after A and B are defined
#=== execute lines one at a time and study them!
order( B$pos )
B[ order( B$pos ), ]
sB <- B[ order( B$pos ), ]
A$count07 <- NA
A$mina <- NA
i <- 1
A[ i, "posA" ] 
A[ i, "posB" ]
sB$pos
A[ i, "posA" ] <= sB$pos
sB$pos <= A[ i, "posB" ] 
A[ i, "posA" ] <= sB$pos & sB$pos <= A[ i, "posB" ]
idx <- A[ i, "posA" ] <= sB$pos & sB$pos <= A[ i, "posB" ]
sB[ idx, "a" ]
a <- sB[ idx, "a" ]
cumsum( a )
cumsum( a ) < 0.7
sum( cumsum( a ) < 0.7 )
A[ i, "count07" ] <- sum( cumsum( a ) < 0.7 )
min( a )

------
[1] https://stat.ethz.ch/pipermail/r-help/2016-June/439404.html

On Mon, 13 Jun 2016, oslo via R-help wrote:

> Hi Petr;
> Thanks so much. Here are the questions and the dput(A) and dput(B). Basicaly I have two questions;
>> dput(A)structure(list(posA = c(1L, 2L, 5L, 4L, 9L), posB = c(9L, 7L, 12L, 7L, 13L)), .Names = c("posA", "posB"), class = "data.frame", row.names = c(NA, -5L))> dput(B)structure(list(pos = c(4L, 2L, 7L, 1L, 13L, 12L, 9L), a = c(0.4, 0.1, 0.5, 0.4, 0.1, 0.2, 0.3), b = c(7L, 5L, 8L, 1L, 6L, 11L, 12L), c = c(0.8, 0.4, 0.32, 0.1, 0.13, 0.01, 0.23)), .Names = c("pos", "a", "b", "c"), class = "data.frame", row.names = c(NA, -7L))
> Q1) Values in A represent the region of chromosome. I need choose these regions in B (all region in A are exist in B in a single column) and then summing up the column "a in B and count the numbers that gives >0.7. For example, consider  the first row in A. They are 1 and 9. After sorting the first column in B then summing column "a" only between 1 to 9 in sorted B and cut off at >0.7. Then count how many rows in sorted B gives >0.7. For example there are only 5 rows between 1 to 9 in sorted B and only summing first 2 of them  gives>0.7 . Then my answer is going to be 2
> Q2) What is the min value of B$a for given each intervals in A
> Regards,
> Oslo 
>
>    On Monday, June 13, 2016 8:05 AM, PIKAL Petr <petr.pikal at precheza.cz> wrote:
> 
>
> Ok.
>
> Instead of explaining what you have, please send a result of
>
> dput(B) and dput(A)
>
> And set you mail client to send plain text mail otherwise your code is barely readable.
>
> What do you want to do with printed values?
>
> What is B? From this it seems that it is data frame but then you try to put sorted data frame into a data frame column.
>
> B$possort=B[order(B$pos),]
>
> With such code and data frame I get an error.
>
> So please try to keep above mentioned when posting a query.
>
> Regards
> Petr
>
>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of oslo via R-
>> help
>> Sent: Friday, June 10, 2016 11:09 PM
>> To: oslo <oslo at yahoo.com>; Jeff Newmiller <jdnewmil at dcn.davis.ca.us>;
>> oslo via R-help <r-help at r-project.org>; oslo <hokut1 at yahoo.com>
>> Subject: Re: [R] summing up a column.
>>
>> Jeff;
>>   thanks for this. My question was job related. No from my course. I need
>> finish a job for the place I work. I am so sorry for causing misunderstanding.
>> thanks,
>> Oslo
>>
>>     On Friday, June 10, 2016 5:08 PM, oslo via R-help <r-help at r-project.org>
>> wrote:
>>
>>
>>   Jeff thanks for this. My question was job related. No from my course. I need
>> finish a job for the place I work. I am so sorry for causing misunderstanding.
>> thanks,
>> Oslo
>>
>>     On Friday, June 10, 2016 5:02 PM, Jeff Newmiller
>> <jdnewmil at dcn.davis.ca.us> wrote:
>>
>>
>>   Multiple posting happens when you are learning a new system, but reading
>> the posting guide can keep the bleeding down.
>>
>> 1) There is a no-homework policy on this list... different educational
>> organizations have different standards for what is acceptable outside help,
>> so you should be using the support offered by your instructor or educational
>> institution.
>>
>> 2) Once you have completed your course, you CAN learn to post data with
>> your code so that it is self-contained... that is, reproducible on our vanilla R
>> session. Using the dput function is one excellent strategy.
>>
>> 3) This is not a problem that needs a loop... as Bert (not Bret) said, you can do
>> this in one or two statements if you simply use basic logical indexing. If your
>> instructor wants you to do it with a loop for sine reason then you really really
>> should not be here... you should be talking to him/her.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On June 10, 2016 1:34:03 PM PDT, oslo via R-help <r-help at r-project.org>
>> wrote:
>> Dear All;
>> I had difficulty to post a mail along with appropriate of data structure. I do
>> sincerely apologize for multiple posting
>>
>>
>> I would like to sum up the B$a column and cut off at 0.7 for the each row of
>> intervals giving in file=A.For example the interval  at the first row in A$posA
>> and A$posB is 1 and 9. So, I need adding up the B$a and cut off B$a>.7 from
>> the 1 to 9 in B$pos. And then I need to the same using the intervals in the
>> second, third..... rows in A. Obviously my loop is wrong and  does not work
>> properly. Please help for my this first experience.  Regards Here are my
>> codes #sorting B$possort=B[order(B$pos),] #Running loop for(i in 1:nrow(A))
>> {if(sum(B[a$B, i:A[1:2])>0.7) {print(A[1:i,]) } } Reply, R R-help at r-project.org
>> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ________________________________
> Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m.
> Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu.
> Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
> Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu.
>
> V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
> - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
> - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou.
> - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
> - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.
>
> This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
> If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.
>
> In case that this e-mail forms part of business dealings:
> - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
> - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
> - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
> - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------


More information about the R-help mailing list