[R] extract fixed width fields from a string

Sarah Goslee sarah.goslee at gmail.com
Fri Jan 20 20:05:38 CET 2012


Reproducible example, please. This doesn't make a whole lot of sense
otherwise.

On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold <sds at gnu.org> wrote:
> Hi,
> I have a data frame with one column containing string of the form "ABC...|XYZ..."
> where ABC etc are fields of 6 alphanumeric characters each
> and XYZ etc are fields of 8 alphanumeric characters each;
> "|" is a mandatory separator;
> I do not know in advance how many fields of each kind will each row contain.
> I need to extract these fields from the string.

This is already a data frame, so you don't need to import it into R,
just process
it?

> === How do I do that?
>
> first I need to split the string in 2 on '|' - how?

strsplit()

> then I need to split the two strings by 6/8 characters -- how?

substring() perhaps


> then I need to convert each 6/8 character string into an integer base 36
> or 64 (depending on the field) - how?

base 36? Really? How are you representing that? Somehow I think you
mean something other than what you said. Either way, please clarify.

> === What do I do with them once I extract them?

I don't know. Save them as a list, most likely.

> First thing I want to do is to have a count table of them.
> Then I thought of adding an extra column for each field value and
> putting 0/1 there, e.g., frame
> 1,AB
> 2,BCD

I thought we had integers at this point?

> will turn into
> 1,1,1,0,0
> 2,0,1,1,1
> however this would work only if the number of different field values is
> manageable.

But we have no idea, because you haven't told us.

> What do people do?
> Can I have a columns of "sets" in data frame?
> Does R support the "set" data type?

factor() seems to be what you're looking for.

> PS. thanks to Sarah Goslee who answered my previous question in so much detail!

You're welcome, but you'd be even more welcome if you'd listened to
the parts of my reply about reproducible examples, clear problem
statements, and reading the posting guide.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list