[R] extract fixed width fields from a string

Sam Steingold sds at gnu.org
Fri Jan 20 19:52:36 CET 2012


Hi,
I have a data frame with one column containing string of the form "ABC...|XYZ..."
where ABC etc are fields of 6 alphanumeric characters each
and XYZ etc are fields of 8 alphanumeric characters each;
"|" is a mandatory separator;
I do not know in advance how many fields of each kind will each row contain.
I need to extract these fields from the string.

=== How do I do that?

first I need to split the string in 2 on '|' - how?
then I need to split the two strings by 6/8 characters -- how?
then I need to convert each 6/8 character string into an integer base 36
or 64 (depending on the field) - how?

=== What do I do with them once I extract them?

First thing I want to do is to have a count table of them.
Then I thought of adding an extra column for each field value and
putting 0/1 there, e.g., frame
1,AB
2,BCD
will turn into
1,1,1,0,0
2,0,1,1,1
however this would work only if the number of different field values is
manageable.
What do people do?
Can I have a columns of "sets" in data frame?
Does R support the "set" data type?

Thanks!

PS. thanks to Sarah Goslee who answered my previous question in so much detail!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://camera.org http://openvotingconsortium.org http://iris.org.il
http://mideasttruth.com http://memri.org http://honestreporting.com
Don't take life too seriously, you'll never get out of it alive!



More information about the R-help mailing list