[R] UNIX-like "cut" command in R

P Ehlers ehlers at ucalgary.ca
Tue May 3 07:56:15 CEST 2011


Mike Miller wrote:
> On Mon, 2 May 2011, Gabor Grothendieck wrote:
> 
>> On Mon, May 2, 2011 at 10:32 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
>>> On Tue, 3 May 2011, Andrew Robinson wrote:
>>>
>>>> try substr()
>>> OK.  Apparently, it allows things like this...
>>>
>>>> substr("abcdef",2,4)
>>> [1] "bcd"
>>>
>>> ...which is like this:
>>>
>>> echo "abcdef" | cut -c2-4
>>>
>>> But that doesn't use a delimiter, it only does character-based cutting, and
>>> it is very limited.  With "cut -c" I can do stuff this:
>>>
>>> echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17-
>>>
>>> abclmnoqrstuvwxyz
>>>
>>> It extracts characters 1 to 3, 12 to 15 and 17 to the end.
>>>
>>> That was a great tip, though, because it led me to strsplit, which can do
>>> what I want, however somewhat awkwardly:
>>>
>>>> y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z"
>>>> paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)
>>> [1] "a b c l m n o q r s t u v w x y z"
>>>
>>> That gives me what I want, but it is still a little awkward.  I guess I
>>> don't quite get what I'm doing with lists.  I'm not clear on how this would
>>> work with a vector of strings.
>>>
>> Try this:
>>
>>> read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL"))
>>   V1   V3         V5
>> 1 abc lmno qrstuvwxyz
> 
> 
> That gives me a few more functions to study.  Of course the new code 
> (using read.fwf() and textConnection()) is not doing what was requested 
> and it requires some work to compute the widths from the given numbers 
> (c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)).
> 
> Mike

Use str_sub() in the stringr package:

require(stringr)  # install first if necessary
s <- "abcdefghijklmnopqrstuvwxyz"

str_sub(s, c(1,12,17), c(3,15,-1))
#[1] "abc"        "lmno"       "qrstuvwxyz"


Peter Ehlers



More information about the R-help mailing list