[R] labels and counting

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Fri Dec 31 00:37:35 CET 2004


On 30-Dec-04 dax42 wrote:
> Hello,
> 
> I have got the following problem:
> given is a large string sequence consisting of the four letters "A" "C"
> "G" and "T" (as before). Additionally, I have got a second string 
> sequence of the same length giving a label for each character. The 
> labels are "+" and "-".
> 
> Now I would like to create an 8x8 matrix which contains the numbers on 
> how often we see all possible pairwise combinations, for example "A" 
> with the label "+" followed by "C" with the label "+" or "T"->"C" with 
> the labels "-"->"+" etc.
> 
> Of course I can just use loops to "walk" along the sequence, but as you
> have shown me so much better solutions in response to my last mail, I 
> thought you might be able to help and improve my R skills even further 
> ..
> 
> Thanks for your ideas!
> Cheers, Winnie

Well, flattery and all that ...

Anyway, the following is an example of how it can be done.
You can cut&paste all the following.

# Artificial example of pairs, one of "A","C","T","G" paired
#   with one of "-","+"
 S<-sample(c("A","C","G","T"),1000,replace=TRUE)
 T<-sample(c("-","+"),1000,replace=TRUE)
 U<-apply(cbind(S,T),1,paste,collapse="")

 U[1:10]
## [1] "C+" "T-" "G+" "T+" "C+" "T+" "T-" "C+" "C-" "C-"
## Shows the first few of the pairs

# constructs 4-character items, each consisting of a pair
#   (e.g. "C+") pasted to its successor (e.g. "T-")
 V<-apply(cbind(U[1:999],U[2:1000]),1,paste,collapse="")

 V[1:7]
## [1] "C+T-" "T-G+" "G+T+" "T+C+" "C+T+" "T+T-" "T-C+"
## Shows the first few of these. Compare with U above.

## Now this is where the real gurus can show their mettle.
## 
## One way to get the counts is simply

 table(V)

## but this is not a nice layout. Another is the loop:

 for(i in sort(unique(V))){print(paste(i,":",sum(V==i)))}

## and I had hoped to think of a solution that did not
## involve a vulgar loop but would also avoid the unhelpful
## layout of table(V). (This is not your 8x8 matrix, but
## converting the output of the loop to one should not be
## impossible ... )

Pending the elegant solution which someone will come up with,
working through the above and consulting "?" for anything
not understood will reveal a few things about R ...

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 30-Dec-04                                       Time: 23:37:35
------------------------------ XFMail ------------------------------




More information about the R-help mailing list