[R] Counting occurances of a letter by a factor

Darin A. England england at cs.umn.edu
Fri Sep 10 22:11:18 CEST 2010


I fiddled around and found this solution, which is far from elegant,
but it doesn't require you to know the factor levels in advance.

t <- with(DF, tapply(as.character(X), Y, table)) 
lapply(t, function(x) 
    table(strsplit(paste(names(x),collapse=""),split="")))

Darin


On Fri, Sep 10, 2010 at 02:40:50PM -0500, Davis, Brian wrote:
> I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.
> 
> Ex.
> > DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> > colnames(DF)<-c("X", "Y")
> > DF
>      X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
> 
> I have an ugly solution, which works if you know the factor levels of Y in advance.
> 
> > ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
> > rownames(ans)<-c("L", "U")
> > ans
>   C G
> L 2 2
> U 3 1
> 
> 
> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
> 
> Any ideas?
> 
> Brian
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list