[Rd] subRaw?

Hervé Pagès hpages at fhcrc.org
Fri Jul 20 07:19:11 CEST 2012


Hi Spencer,

On 07/19/2012 08:29 PM, Spencer Graves wrote:
> Hello, All:
>
>
>        Do you know of any capability to substitute more then one byte in
> an object of class Raw?
>
>
>        Consider the following:
>
>
>  > let4 <- paste(letters[1:4], collapse='')
>  > (let4Raw <- charToRaw(let4))
> [1] 61 62 63 64
>  > (let. <- sub('bc', '--', let4Raw))
> [1] "61" "62" "63" "64"
>  > # no substitution
>  > (bc <- charToRaw('bc'))
> [1] 62 63
>  > (ef <- charToRaw('ef'))
> [1] 65 66
>  > (let. <- sub(bc, ef, let4Raw))
> [1] "61" "65" "63" "64"
> Warning messages:
> 1: In sub(bc, ef, let4Raw) :
>    argument 'pattern' has length > 1 and only the first element will be
> used
> 2: In sub(bc, ef, let4Raw) :
>    argument 'replacement' has length > 1 and only the first element will
> be used

It makes no sense to use sub(), grep(), and family (i.e. all the stuff
based on the regex code) *directly* on a raw vector because all these
functions will start by coercing their 'x', 'text', 'pattern',
'replacement' args to character with as.character (if they are not
already character).

But the way as.character() operates on a raw vector won't give good
results in that context. You'd rather do the coercion yourself first
with rawToChar(), and coerce back the result with charToRaw():

   > charToRaw(sub("bc", "--", rawToChar(let4Raw)))
   [1] 61 2d 2d 64

IMO it would make much more sense that sub(), grep(), and family()
raise an error than blindly try to coerce to character but these
functions (like many functions in R) are too polite to tell the
user s/he's doing something wrong.

Cheers,
H.

>
>
>        In this example, "b" was replaced by "e", but "bc" was not
> replaced by "ef"?  Do you know of any function to do this?
>
>
>        I ask, because I need it.  I've written such a function, subRaw
> for my own use.  If I don't hear that another exists, I plan to add the
> one I've written to the oro.dicom package.
>
>
>        Thanks,
>        Spencer
>
>
>  > sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list