[R] Query about data manipulation

lalitha viswanath lalithaviswanath at yahoo.com
Thu Mar 1 21:35:46 CET 2007


Hi
Thanks much for that input.It was extremely helpful.

I am seeking some input about another stumbling block
using RODBC; SQLQuery et al with large BLOB values.

Although the following query 
dataFromDB <- sqlQuery(channel, "select
uncompress(columnName) from tableName where Id=id ");
returns just one row , dataFromDB[1,1] actually
contains 4000+ rows of the form 
field1 \t field2 \t value\n.... described earlier.
(4000+ rows compressed as one long string)

On printing dataFromDB[1,1], it does not print beyond
3600 such rows or so (printing in fact "field1 \t
field2 \t value \n.....field3600 \t field3601"),
abruptly missing the rest of the result. 

Hence it throws an error when I try to use read.table
(after using textConnection as suggested) that row xyz
does not contain 3 values,etc.

It seems to be missing 1/4th of the actual result that
should contain 4000+ such pairs.

The set of 4000+ rows occupy just 100KB if written out
to a file directly from MySQL.
Is there anyway to increase the capacity of the return
result in R so that it does not get thrown off as
above and retrieves the ENTIRE result?

I tried increasing buffsize, but as I understand,
since SqlQuery itself returns just one row in this
case, it  is possibly not very relevant here?

Note that the above mentioned problem does not arise
when the data returned from SQL query contains less
than 3500 such concatenated entries.

Your input is greatly appreciated.
Thanks
Lalitha
--- Marc Schwartz <marc_schwartz at comcast.net> wrote:

> On Thu, 2007-03-01 at 08:34 -0800, lalitha viswanath
> wrote:
> > Hi
> > Thanks much for the prompt response to my earlier
> > enquiry on packages for regression analyses.
> > Along the same topic(?), I have another question
> about
> > which I could use some input.
> > 
> > I am retreiving data from a MySQL database using
> > RODBC. 
> > The table has many BLOB columns and each BLOB
> column
> > has data in the format
> > "id1 \t id2 \t measure \n id3 \t id4 \t
> measure...."
> > (i.e. multiple rows compressed as one long string)
> > 
> > I am retreiving them as follows.
> > 
> > dataFromDB <- sqlQuery(channel, "select
> > uncompress(columnName) from tableName");
> > 
> > 
> > I am looking for ways to convert this long
> "string"
> > into a table/dataframe in R, making it easier for
> > further post processing etc without
> reading/writing it
> > to a file first.
> > 
> > Although by doing write.table and reading it in
> again,
> > I got the result in a data frame, with the \t and
> \n
> > interpreted correctly, I wish to sidestep this as
> I
> > need to carry out this analyses for over 4 million
> > such entries.
> > I tried 
> > write.table(dataFromDB, file="FileName");
> > dataFromFile <- read.table(FileName, sep="\t") 
> > dataFromFile is of the form
> > 
> > 92_8_nmenA      993_7_mpul      1.042444
> > 92_8_nmenA      3_5_cpneuA      0.900939
> > 190_1_rpxx      34_4_ctraM      0.822532
> > 190_1_rpxx      781_6_pmul      0.870016
> > 
> > Your input on the above is greatly appreciated.
> > Thanks
> > Lalitha
> 
> The easiest way might be to use a textConnection().
> 
> Let's say that you have read in your data as above
> and you have a column
> called 'blob':
> 
> > dataFromDB
>                                             blob
> 1 id1 \t id2 \t measure \n id3 \t id4 \t measure
> 
> 
> #Open textConnection.  Note coercion to character
> BLOB <-
> textConnection(as.character(dataFromDB$blob))
> 
> # Read in the column
> DF <- read.table(BLOB, sep = "\t")
> 
> # Close the connection
> close(BLOB)
> 
> 
> > DF
>      V1    V2        V3
> 1  id1   id2   measure 
> 2  id3   id4   measure
> 
> 
> See ?textConnection
> 
> HTH,
> 
> Marc Schwartz
> 
> 
> 



 
____________________________________________________________________________________
Any questions? Get answers on any topic at www.Answers.yahoo.com.  Try it now.



More information about the R-help mailing list