[R] Separating a Complicated String Vector

David Winsemius dwinsemius at comcast.net
Sun Jan 4 08:47:35 CET 2015


On Jan 3, 2015, at 9:20 PM, npretnar wrote:

> Sorry. Bad example on my part. Try this. V1 is ...
> 
> V1
> alabama
> bates
> tuscaloosa
> smith
> arkansas
> fayette
> little rock
> alaska
> juneau
> nome
> 
> And I want:
> 
> V1			V2
> alabama	bates
> alabama	tuscaloosa
> alabama	smith
> arkansas	fayette
> arkansas	little rock
> alaska		juneau
> alaskas		nome


dat$is_state <- grepl(tolower(paste(state.name, collapse="|")), dat$V1)

dat$thisstate <- cumsum(rownames(dat) %in% which(dat$is_state) )
dat2 <- data.frame(V1 = dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ] ,
                   V2 = dat$V1[ !dat$is_state] )


> dat2
        V1         V2
1  alabama      bates
2  alabama tuscaloosa
3  alabama      smith
4 arkansas    fayette
5 arkansas     little
6 arkansas       rock
7   alaska     juneau
8   alaska       nome

-- 
David.

> 
> This is more representative of the problem, extended to all 50 states.
> 
> - Nick
> 
> 
> On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
> 
>> I'm not sure what's so complicated about that (am I missing
>> something?). You can search using grep, and replace using gsub, so
>> 
>> tmpDF <- read.table(text="V1      V2
>> A       5
>> a1      1
>> a2      1
>> a3      1
>> a4      1
>> a5      1
>> B       4
>> b1      1
>> b2      1
>> b3      1
>> b4      1",
>>                   header=TRUE)
>> tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
>> data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1)))
>> 
>> Seems to do the trick.
>> 
>> Best,
>> Ista
>> 
>> On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com> wrote:
>>> I have a string variable (V1) in a data frame structured as follows:
>>> 
>>> V1      V2
>>> A       5
>>> a1      1
>>> a2      1
>>> a3      1
>>> a4      1
>>> a5      1
>>> B       4
>>> b1      1
>>> b2      1
>>> b3      1
>>> b4      1
>>> 
>>> I want the following:
>>> 
>>> V1      V2      V3
>>> a1      1       A
>>> a2      1       A
>>> a3      1       A
>>> a4      1       A
>>> a5      1       A
>>> b1      1       B
>>> b2      1       B
>>> b3      1       B
>>> b4      1       B
>>> 
>>> I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated.
>>> 
>>> Thanks,
>>> 
>>> Nicholas Pretnar
>>> Mizzou Economics Grad Assistant
>>> npretnar at gmail.com


David Winsemius
Alameda, CA, USA



More information about the R-help mailing list