[R] apply a function down each column

Laetitia Schmid laetitia at gmt.su.se
Tue Jan 12 09:11:19 CET 2010


Dear Steve,
my solution looks like it would work, but it does not.
I attached a text file with an extract of my data. Maybe you can try  
it yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,,  
for each column.
I do not really know what the problem is. R complains about a syntax  
error.
The function I am applying counts the common strings between the two.  
Greg Hirson helped me to write it.

lettermatch <- function(a, b) {
    tb <- merge(as.data.frame(table(strsplit(a, ""))),  
as.data.frame(table(strsplit(b, ""))), by="Var1")
    sum(apply(tb[-1], 1, min))
}

For example for the second column I tried:

for (x in 1:(nrow(dat)-1)) {
a <- as.character(dat[(2x-1),1])
b <- as.character(dat[(2x),1])
  lettermatch(a,b)
}

or

  a <- as.character(dat[seq(1, nrow(dat), by=2),2])
  b <- as.character(dat[seq(2, nrow(dat), by=2), 2])
  all.results <- lettermatch(a,b)

With "dat<-read.delim("data_lgs.txt",stringsAsFactors=FALSE)" I can  
leave the "as.character" away in the formula above.

Laetitia

Individuals	Seq1	Seq2	Seq3	Seq4
C1	GGGG	AATT	CCGG	CTTT
M1	GGGG	AAAA	GGGG	GGGG
C2	GGGG	AATT	CCGG	CTTT
M2	AGGG	AACT	CCGG	CGTT
C3	AGGG	AACT	CCGG	CGTT
M3	AGGG	AACT	CCGG	CGTT
C4	GGGG	AATT	CCGG	CCTT
M4	GGGG	AAAT	CGGG	CTTT
C5	AGGG	ACTT	CCCG	CTTT
M5	AGGG	CTTT	CCCC	CCTT
C6	AGGG	CTTT	CCCC	CCTT
M6	AAAG	CCTT	CCCC	CTTT
C7	AAAG	ACCC	CCCG	GTTT
M7	AAGG	AACC	CCGG	TTTT
C8	GGGG	AATT	CCGG	CCTT
M8	GGGG	AATT	CCGG	CCTT
C9	GGGG	AAAA	GGGG	TTTT
M9	GGGG	AAAA	GGGG	TTTT
C11	AGGG	AAAC	CGGG	GGTT
M11	GGGG	AATT	CCGG	CCTT



Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:

> Hi,
>
> On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid  
> <laetitia at gmt.su.se> wrote:
>> Hello World,
>> I have a function that makes pairwise comparisons between two  
>> strings. I would like to apply this function to my data (which  
>> consists of columns with different strings) in the way that it  
>> compares the first with the second entry, and then the third with  
>> the fourth, and then the fifth with the sixth, and so on down each  
>> column...
>> So (2x-1) and (2x) would be the different entries to be compared!
>>
>> dat= my data:
>>
>> for the first column: compare dat[(2x-1),1] with dat[(2x),1] and x  
>> would be 1:i, i=length(dat[,1])
>>
>> I think the best way to do that is a loop:
>>
>> a <- as.character(dat[(2x-1),1])
>> b <- as.character(dat[(2x),1])
>>
>> for (i in 1:length(dat[,1]) my_function(a, b))
>>
>> Can somebody help me to apply a function with a loop in the way I  
>> want to a column?
>
> It seems as if you got it already, don't you?
>
> for (x in 1:(nrow(dat)-1)) {
>  a <- dat[(2x-1),1]
>  b <- dat[(2x), 1]
>  my_function(a,b)
> }
>
>> Is there a specification of "tapply" for that?
>
> I don't think so, but depending on what you want to do, the size of
> your data, and the amount of RAM you have, it might be faster to
> compare everything "at once" (assuming `my_function` can be
> vectorized), for instance:
>
> a <- dat[seq(1, nrow(dat), by=2),1]
> b <- dat[seq(2, nrow(dat), by=2), 1]
> all.results <- my_function(a,b)
>
> Also, as an aside, I see you keep calling "as.character" on your data
> when you extract it from your data.frame. Is your data being converted
> to factors? You can look to set stringsAsFactors=FALSE if this is the
> case and you are reading in data using read.table/delim/etc (see:
> ?read.table)
>
> Hope that helps,
>
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list