[R] UNIX-like "cut" command in R

Mike Miller mbmiller+l at gmail.com
Tue May 3 06:26:59 CEST 2011


On Mon, 2 May 2011, Gabor Grothendieck wrote:

> On Mon, May 2, 2011 at 10:32 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>> On Tue, 3 May 2011, Andrew Robinson wrote:
>>
>>> try substr()
>>
>> OK.  Apparently, it allows things like this...
>>
>>> substr("abcdef",2,4)
>>
>> [1] "bcd"
>>
>> ...which is like this:
>>
>> echo "abcdef" | cut -c2-4
>>
>> But that doesn't use a delimiter, it only does character-based cutting, and
>> it is very limited.  With "cut -c" I can do stuff this:
>>
>> echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17-
>>
>> abclmnoqrstuvwxyz
>>
>> It extracts characters 1 to 3, 12 to 15 and 17 to the end.
>>
>> That was a great tip, though, because it led me to strsplit, which can do
>> what I want, however somewhat awkwardly:
>>
>>> y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z"
>>> paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)
>>
>> [1] "a b c l m n o q r s t u v w x y z"
>>
>> That gives me what I want, but it is still a little awkward.  I guess I
>> don't quite get what I'm doing with lists.  I'm not clear on how this would
>> work with a vector of strings.
>>
>
> Try this:
>
>> read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL"))
>   V1   V3         V5
> 1 abc lmno qrstuvwxyz


That gives me a few more functions to study.  Of course the new code 
(using read.fwf() and textConnection()) is not doing what was requested 
and it requires some work to compute the widths from the given numbers 
(c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)).

Mike


More information about the R-help mailing list