[R] Data type problem when extract data from SQLite to R by using RSQLite

chen jia chen_1002 at fisher.osu.edu
Tue Mar 1 19:06:24 CET 2011


Hi Seth,

Thanks so much for identifying the problem and explaining everything.
I think the first solution that you suggest--make sure the schema has
well defined types--would work the best for me. But, I have one
question about how to implement it, which is more about sqlite itself.

First, I found out that the columns that don't have the expected data
types in the table annual_data3 are created by aggregate functions in
a separate table. These columns are later combined with other columns
that do.

I read the link that you provide,
http://www.sqlite.org/datatype3.html. One paragraph says "When
grouping values with the GROUP BY clause values with different storage
classes are considered distinct, except for INTEGER and REAL values
which are considered equal if they are numerically equal. No
affinities are applied to any values as the result of a GROUP by
clause."

If I understand it correctly, the columns created by aggregate
functions with a GROUP by clause do not have any expected data types.

My solution is to use CREATE TABLE clause to declare the expected
datatype and then insert the values of columns created by the
aggregate functions with the GROUP by clause. However, this solution
requires a CREATE TABLE cause every time the aggregate function and
the GROUP by clause is used.

My question is: Is this the best way to make sure that the columns as
a result of a GROUP by clause have the expected data types? Thanks.

Best,
Jia

On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon <seth at userprimary.net> wrote:
> Hi Jia,
>
> On Mon, Feb 28, 2011 at 6:57 PM, chen jia <chen_1002 at fisher.osu.edu> wrote:
>> The .schema of table annual_data3 is
>> sqlite> .schema annual_data3
>> CREATE TABLE "annual_data3"(
>>  PERMNO INT,
>>  DATE INT,
>>  CUSIP TEXT,
>>  EXCHCD INT,
>>  SICCD INT,
>>  SHROUT INT,
>>  PRC REAL,
>>  RET REAL,
>>  ...
>>  pret_var,
>>  pRET_sd,
>>  nmret,
>>  pya_var,
>
> [snip]
>
> Is there a reason that you've told SQLite the expected data type for
> only some of the columns?
>
>> Interestingly, I find that the problem I reported does not for columns
>> labeled real in the schema info. For example, the type of column RET
>> never changes no matter what the first observation is.
>
> Yes, that is expected and I think it is the solution to your problem:
> setup your schema so that all columns have a declared type.  For some
> details on SQLite's type system see
> http://www.sqlite.org/datatype3.html.
>
> RSQLite currently maps NA values to NULL in the database.  Pulling
> data out of a SELECT query, RSQLite uses the sqlite3_column_type
> SQLite API to determine the data type and map it to an R type.  If
> NULL is encountered, then the schema is inspected using
> sqlite3_column_decltype to attempt to obtain a type.  If that fails,
> the data is mapped to a character vector at the R level.  The type
> selection is done once after the first row has been fetched.
>
> To work around this you can:
>
> - make sure your schema has well defined
>  types (which will help SQLite perform its operations);
>
> - check whether the returned column has the expected type and convert
>  if needed at the R level.
>
> - remove NA/NULL values from the db or decide on a different way of
>  encoding them (e.g you might be able to use -1 in the db in some
>  situation to indicate missing).  Your R code would then need to map
>  these to proper NA.
>
> Hope that helps.
>
> + seth
>
>
>
> --
> Seth Falcon | @sfalcon | http://userprimary.net/
>



-- 
700 Fisher Hall
2100 Neil Ave.
Columbus, Ohio  43210
http://www.fisher.osu.edu/~chen_1002/



More information about the R-help mailing list