[R] string handling

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jun 4 14:08:50 CEST 2010


This solution using strapply in gsubfn is along the same lines as the
stringr solution.  First we read in the data using as.is = TRUE so
that we get character rather than factor columns.  On the other hand,
if your data is already in columns with class factor then just replace
strappy(x, ...) with strapply(as.character(x), ...) below.   Then
lapply over the columns of DF using strapply on each one.    See home
page at http://gsubfn.googlecode.com for more.

> Lines <- "var1        var2
+ 9G/G09    abd89C/T90
+ 10A/T9    32C/C
+ 90G/G      A/A"
>
> library(gsubfn)
> DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE)
> lapply(DF, function(x) strapply(x, "(.)/(.)", c, simplify = rbind))
$var1
     [,1] [,2]
[1,] "G"  "G"
[2,] "A"  "T"
[3,] "G"  "G"

$var2
     [,1] [,2]
[1,] "C"  "T"
[2,] "C"  "C"
[3,] "A"  "A"


Also a slight simplification is possible using gsubfn's capability of
representing a one line function as a formula.  We just preface lapply
with fn$ and then formulas appearing in the arguments (subject to
certain rules) are interpreted as functions.  Here, the formula in the
second argument to lapply is interpreted as the anonymous function we
used above:

> fn$lapply(DF, x ~ strapply(x, "(.)/(.)", c, simplify = rbind))
$var1
     [,1] [,2]
[1,] "G"  "G"
[2,] "A"  "T"
[3,] "G"  "G"

$var2
     [,1] [,2]
[1,] "C"  "T"
[2,] "C"  "C"
[3,] "A"  "A"

On Thu, Jun 3, 2010 at 2:18 PM, karena <dr.jzhou at gmail.com> wrote:
>
> I have a data.frame as the following:
> var1        var2
> 9G/G09    abd89C/T90
> 10A/T9    32C/C
> 90G/G      A/A
> .             .
> .             .
> .             .
> 10T/C      00G/G90
>
> What I want is to get the letters which are on the left and right of '/'.
> for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I
> only want "C" and "T", how to get these?
>
> thank you,
>
> karena
> --
> View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list