[R] matrix(unlist(strsplit(""))) 'missing value' issue

MaartenJacobs maart_jacobs at hotmail.com
Tue Mar 27 16:55:40 CEST 2012


*I'm still a R noob, just had a couple of lectures about it in our research
master.

There is a Deal or no deal experiment where I have to write some code for.
Someone wrote a website to gather the data and write it in a .xlsx file.
These are seperate files for seperate participants so first I have to import
the seperate datafiles. I do that like this:
# Merge the xlsx files into one dataframe
alldata <- rbind(read.xlsx('experimentdata.xlsx',1), 
                 read.xlsx('experimentdata_1.xlsx',1),
                 read.xlsx('experimentdata_2.xlsx',1)
                #etc..#read.xlsx('filepath',1)
                 )

The website is poorly written and some of the variables are not conveniant.
I have the variables 'bankoffer.1', 'bankoffer.3', 'bankoffer.5' etc.
These variables look like the following:
alldata$bankoffer.1
[1] 246000:accepted    267000:notaccepted 200000:notaccepted
Levels: 246000:accepted 267000:notaccepted 200000:notaccepted

> alldata$bankoffer.3
[1] 9999999            429000:notaccepted 48000:notaccepted 
Levels: 9999999 429000:notaccepted 48000:notaccepted
The problem is that the values in the cells are weird, they constitude for
example of /'246000:accepted'/I would decompose that so that /246000 /is in
one variable and /accepted /in another

no problem just do this:
> as.data.frame(matrix(unlist(strsplit(as.character(alldata$bankoffer.1),":")),
> ncol = 2, byrow = TRUE))
      V1          V2
1 246000    accepted
2 267000 notaccepted
3 200000 notaccepted

However when there are missing values, like in bankoffer.3, there is a
problem

> as.data.frame(matrix(unlist(strsplit(as.character(alldata$bankoffer.3),":")),
> ncol = 2, byrow = TRUE))
           V1      V2
1     9999999  429000
2 notaccepted   48000
3 notaccepted 9999999
Warning message:
In matrix(unlist(strsplit(as.character(alldata$bankoffer.3), ":")),  :
  data length [5] is not a sub-multiple or multiple of the number of rows
[3]

R does not encounter a ':' in the 9999999 and therefor places the 429000 in
the second colomn, this should however be in the first one. Like this:
           V1      V2
1     9999999  9999999
2  429000 notaccepted   
3 48000  notaccepted 

How can I tell R to place 9999999 in both colomns when he/she encounters a
9999999. Or any other solotion to my problem is also good. I for example
thought about making R add ':9999999' whenever it encounters 9999999 as a
sort of a workaround for the problem but I have no idea how to do that.

I hope I made it a little clear what the problem is and what I eventually
want. If not please ask.

Greetings Maarten

--
View this message in context: http://r.789695.n4.nabble.com/matrix-unlist-strsplit-missing-value-issue-tp4509065p4509065.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list