[R] Specifying data type when creating a dataframe using RODBC
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed May 13 07:48:26 CEST 2009
On Tue, 12 May 2009, Mark Na wrote:
> H R-helpers,
> I am using the following code to make a dataframe from an Excel spreadsheet:
> channel <- odbcConnectExcel("Spreadsheet.xls")
> Data <- sqlFetch(channel, "Tab1")
> One column (several, actually) in the spreadsheet contains integers in
> its first few rows but later values in these columns contain a mixture
> of numbers, letters and symbols (it's an ID variable, containing e.g.,
> 12, 14, 19, 19B, 19C, 19/20)
> R creates this column as a numeric variable (I think because its first
> few variables are numbers) but as soon as R gets to the non-numeric
> values (e.g., 19/20) it replaces them with NA.
R doesn't do that: nor does RODBC. You can try reading the column
with 'as.is = TRUE' (see ?sqlFetch), but this looks like a well-known
Excel/ODBC driver bug where Excel decides the column is numeric based
on the first few entries and hence exports it as numbers (and NULLs,
the SQL version of NAs).
> So, my question is: how can I specify that certain columns are to be
> read as character variables BEFORE the dataframe is created?
You can try specifying the column type in Excel, so it is recorded on
the spreadsheet. I am not sure that works in all cases.
> I have tried using as.character() in the third line (above) but it
> creates a very long first column containing all of my data...
> Thanks for any help you might provide,
> Mark Na
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help