[R] How can I avoid a for-loop through sapply or lapply ?

Charles C. Berry cberry at tajo.ucsd.edu
Tue Sep 29 18:10:53 CEST 2009


On Tue, 29 Sep 2009, mauede at alice.it wrote:

> Through converting a miRNAs file from FASTA to character  format I get a vector which looks like the following:
>
>> nml
>  [1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a"
>  [2] "hsa-let-7b MIMAT0000063 Homo sapiens let-7b"
>  [3] "hsa-let-7c MIMAT0000064 Homo sapiens let-7c"
>  [4] "hsa-let-7d MIMAT0000065 Homo sapiens let-7d"
>  [5] "hsa-let-7e MIMAT0000066 Homo sapiens let-7e"
>  [6] "hsa-let-7f MIMAT0000067 Homo sapiens let-7f"
>  [7] "hsa-miR-15a MIMAT0000068 Homo sapiens miR-15a"
>  [8] "hsa-miR-16 MIMAT0000069 Homo sapiens miR-16"
>  [9] "hsa-miR-17 MIMAT0000070 Homo sapiens miR-17"
> [10] "hsa-miR-18a MIMAT0000072 Homo sapiens miR-18a"
>        .......................................................................................................
> [888] "hsa-miR-675* MIMAT0006790 Homo sapiens miR-675*"
> [889] "hsa-miR-888* MIMAT0004917 Homo sapiens miR-888*"
> [890] "hsa-miR-541* MIMAT0004919 Homo sapiens miR-541*"
>
>
> My goal is to separate into a vector only the first string preceding the 1st space starting from the left.
> With reference to the records above listed I would obtain:
> [1] "hsa-let-7a"
>  [2] "hsa-let-7b"
>  [3] "hsa-let-7c"
>  [4] "hsa-let-7d"
>  [5] "hsa-let-7e"
>  [6] "hsa-let-7f f"
>  [7] "hsa-miR-15a"
>  [8] "hsa-miR-16"
>  [9] "hsa-miR-17"
> [10] "hsa-miR-18a"
>        .......................................................................................................
> [888] "hsa-miR-675*"
> [889] "hsa-miR-888*"
> [890] "hsa-miR-541*"



sub( "[ ].*", "", nml )


>
> I tried using strsplit as follows:
> strsplit(nml,"MIMAT[0-9]*")
> from here I get a vector of lists and I can separate the string I need through the [[]] operator, as shown in the following.
>> strsplit(nml,"MIMAT[0-9]*")[[1]][1]
> [1] "hsa-let-7a "
>> strsplit(nml,"MIMAT[0-9]*")[[2]][1]
> [1] "hsa-let-7b "
>
> Unluckily the [[]] operator acts on one vector element at a time. In fact:
>> strsplit(nml,"MIMAT[0-9]*")[[]][1]
> Error in strsplit(nml, "MIMAT[0-9]*")[[]] :
>  invalid subscript type 'symbol'
>
> I guess a smart combination os strsplit ans sapply or lapply could do the job with one command line only ...
> but I haven't been able to get the syntax right ... I would greatly appreciate some help from R language experts.
> I know I can use a for-loop to get what I am struggling for. But Idefinitely wish to learn to use a high-level language
> as it deserves rather than the C-style.
>
> Thank you in advance,
> Maura
>
>
>
>
>
>
>
> tutti i telefonini TIM!
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list