[R] RODBC and Excel: Wrong Data Type Assumed on Import

Earl F. Glynn efg at stowers-institute.org
Fri Nov 4 20:44:12 CET 2005


"Gabor Grothendieck" <ggrothendieck at gmail.com> wrote in message
news:971536df0511031753y6eb4e271o797e862bc2740e1a at mail.gmail.com...
> You could try using the COM interface rather than the ODBC
> interface.  Try code such as this:
>
> library(RDCOMClient)
> xls <- COMCreate("Excel.Application")
> xls[["Workbooks"]]$Open("MySpreadsheet.xls")
> sheet <- xls[["ActiveSheet"]]
> mydata <- sheet[["UsedRange"]][["value"]]
> xls$Quit()
>
> # convert mydata to a character matrix
> mydata.char <- matrix(unlist(mydata), nc = length(xx))

Gabor,

Thank you for that suggestion.  I try to avoid COM, but it seems to work
well with this problem.

Because I have empty cells, which are treated as NULLS, the unlist didn't
quite work.

Here's what I did:

library(RDCOMClient)
xls <- COMCreate("Excel.Application")
xls[["Workbooks"]]$Open("U:/efg/lab/R/Plasmid/construct list.xls")
sheet <- xls[["ActiveSheet"]]
mydata <- sheet[["UsedRange"]][["value"]]
xls$Quit()

for (column in 1:length(mydata))
{
  cat(column, " ", length(mydata[[column]]), " ",
length(unlist(mydata[[column]])), "\n")
}

The results show that while mydata is a list of columns, if you unlist each
column you'll be short by the number of NULL values.

1   1251   1251
2   1251   1198
3   1251   870
4   1251   327
5   1251   1250

This seemed a bit crude to fix that problem (can someone suggest a more
elegant way?):

mymatrix <- NULL
for (column in 1:length(mydata))
{
 # Use lappy to replace NULLs with "" strings, column-by-column
  mymatrix <- cbind(mymatrix, lapply(mydata[[column]], function(cell) {
ifelse(is.null(cell), "", cell) } ))
}
# Fix column names
colnames(mymatrix) <- mymatrix[1,]
mymatrix <- mymatrix[-1,]

> mymatrix[273:276,]
     Plasmid Number Plasmid
Concentration Comments Lost
[1,] 274            "yxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy"         "1 ug/ul"
"4 mg"   ""
[2,] "275a"         "xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyx" "1 ug/2
ul"   ""       ""
[3,] "275b"         "xyxyxyxyxyxyxyxyxyxyxyxyx"                   "1 ug/5
ul"   ""       ""
[4,] 276            "xyxyxyxyxyxyxyxyxyxyxyxyxyxy"                "1 ug/5
ul"   ""       "Assumed Lost"

Thank you for preserving "275a" and "275b" as the names here.

So, I'd recommend RDCOMClient over RODBC with Excel files.  "Being lucky"
shouldn't be part of processing Excel files.

efg




More information about the R-help mailing list