[R] I'm trying to parse 1 column of a dataframe into 3 seperate columns

David L Carlson dcarlson at tamu.edu
Tue Jan 15 00:31:48 CET 2013


How about

a <- sapply(test, function(x) x[1])
s <- sapply(test, function(x) x[2])
e <- sapply(test, function(x) x[3])

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Joel Pulliam
> Sent: Monday, January 14, 2013 4:30 PM
> To: r-help at r-project.org
> Cc: pulliamjs at gmail.com
> Subject: [R] I'm trying to parse 1 column of a dataframe into 3
> seperate columns
> 
> I have a factor called 'utm_medium' in the dataframe 'data'
> 
> > str(data$utm_medium)
> 
> Factor w/ 396925 levels
> "","affiliateID=&sessionID=0000821850667323ec6ae6cffd28f380&etag=",..:
> 366183 355880 357141 20908 357513 365348 368088 360827 31704 364767 ...
> 
> 
> 
> The data in this factor is delimited with '&'. I basically want the
> affiliateID, sessionID and etag data separated. Ex.
> 
> > data$utm_medium[1:10]
> 
> [1]
> affiliateID=4f3ac4b695e7d&sessionID=993f4c447e68dfc36ed692223349f2e3&et
> a
> g=
> 
> [2]
> affiliateID=4f3ac4b695e7d&sessionID=209dd9986ace55d50a450afeba62b78f&et
> a
> g=
> 
> [3]
> affiliateID=4f3ac4b695e7d&sessionID=2efdb8e1e1f5ac9c0d5baec355c78f85&et
> a
> g=
> 
> [4] affiliateID=&sessionID=5a6ca9d41148f30ce694628427af7991&etag=
> 
> 
>  [5]
> affiliateID=4f3ac4b695e7d&sessionID=331fbcdf1f3d5e7bac0d92c12e19f63d&et
> a
> g=
> 
> [6]
> affiliateID=4f3ac4b695e7d&sessionID=8fc27c8478e9bd30043ea4d3c7ddb29c&et
> a
> g=
> 
> [7]
> affiliateID=4f3ac4b695e7d&sessionID=af467d480addffca43ffbdbce1edfdb4&et
> a
> g=
> 
> [8]
> affiliateID=4f3ac4b695e7d&sessionID=598645e05a187ee63ff922a36360f021&et
> a
> g=
> 
> [9] affiliateID=&sessionID=8895e21d0842ed45063ba8328dc3bc61&etag=
> 
> 
> [10]
> affiliateID=4f3ac4b695e7d&sessionID=88ca2998c5a91b6efbece0c4f79caeb7&et
> a
> g=
> 
> 396925 Levels:  ...
> affiliateID=50bfbbbeed918&sessionID=5c49c142cbf1b149c6a4647d1a4fc97b&et
> a
> g=
> 
> 
> 
> I've parsed it via:
> 
> test <-as.character(data$utm_medium)
> 
> test <- strsplit(test, "&")
> 
> 
> 
> which results in a list, which I 'unlisted':
> 
> test2 <- unlist(test)
> 
> 
> 
> and then attempted to extract into separate vectors:
> 
> a <- vector(mode = "character", length = length(test2))
> 
> s <- vector(mode = "character", length = length(test2))
> 
> e <- vector(mode = "character", length = length(test2))
> 
> i <- 1
> 
> j <- 1
> 
> 
> 
>   for (i in 1:length(test2))
> 
>   {
> 
>     a[j] <- test2[i]
> 
>     s[j] <- test2[i+1]
> 
>     e[j] <- test2[i+2]
> 
>     i <- i + 3
> 
>     j <- j + 1
> 
>   }
> 
> 
> 
> This code runs, but I'm indexing it incorrectly and I can't figure out
> why. I'll sleep on it tonight and probably figure it out, but I can't
> help thinking that there's a much easier way to parse this data. Help!
> Please!
> 
> 
> 
> joel
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list