[R] Handling the windows clipboard/32KB limit

Ritter, Christian C MCIL-CTANL/S christian.ritter at shell.com
Thu Sep 9 14:47:59 CEST 2004


Thanks Brian for the hint. 

Indeed, writeClipboard does not suffer from the 32KB limit. Here is a line which can put large quantities of data on the clipboard:
writeClipboard(gsub("NA","\#N/A",apply(Alldata,1,paste,collapse="\t")))
this can then be pasted either manually or via a VBA macro into a spreadsheet. 

Here is some more info for R/Excel junkies:
===========================================
VBA example using RExcel to pass the command:
Sub FetchDataframe(Dfname As String, TargetRange As Range)
    Dim commandString As String
    commandString = "writeClipboard(c(paste(colnames(" & Dfname & "),collapse='\t'),gsub('NA','\#N/A',apply(" & Dfname & ",1,paste,collapse='\t'))))"
    Call RRun(commandString)
    TargetRange.PasteSpecial
End Sub
Again, the time for 
	FetchDataframe "Alldata",Range("Sheet1!$A$1") 
on about 120000 cells is on the order of two or three seconds. I looked at scaling. For 360000 (mixed type) cells, the required time was on the order of 15 seconds. 

I also looked at putting data from Excel to R: Here things scale even better. Using the code:
Sub PutDataframe(Dfname As String, TargetRange As Range)
    Dim commandString As String
    commandString = Dfname & "<-read.table(file=file(description='clipboard'),sep='\t',na.strings=c('#N/A',''),header=TRUE,comment.char=';')"
    TargetRange.Copy
    Call RRun(commandString)
End Sub
the transfer 
	PutDataframe "Alldata",Range("Alldata")
took 7 seconds for 360000 (mixed type) cells.  	

As Brian points out: Obviously, RODBC is more flexible for selecting, cleaning, etc. But in my experience it is a bit slow when dealing with Excel as a data source (via dsn<-odbcConnectExcel("Myfile.xls"). I redid my 120000 cell example using Alldata<-sqlFetch(dsn,"Alldata") and needed about 80 seconds. Now, this is not the fault of RODBC but probably of starting and using the jet data engine within Windows. 
Moreover, in particular, if the workbook Myfile.xls is actually open wile odbc is working on it, there may be memory or performance leaks (probably unrelated to RODBC). If I retrieve the same data frame (from Excel) several times, (either opening and closing the connection every time or opening it once and fetching several times) the retrieval times can become much slower as a go. At some time there may be "waiting for an OLE action to complete messages".

RExcel/R(D)COM gives a third possibility with probably much more flexibility/safety than my crude clipboard method and we are currently working hard at increasing its speed. Currently, we are at about 20 seconds for the 120000 cell example.

In summary, what strikes me about the clipboard approach is its speed and the size of objects which can be moved between Excel and R. Some fine tuning will be needed to deal with exceptions. 



-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: Thursday, 09 September, 2004 12:41 PM
To: Ritter, Christian C MCIL-CTANL/S
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Handling the windows clipboard/32KB limit


On Thu, 9 Sep 2004, Ritter, Christian C MCIL-CTANL/S wrote:

> (R 1.9.1; Windows 2000;)
> 
> I'm just comparing ease of use, speed, etc for methods of transferring
> data frames in the Excel, MySQL, R triangle. It turns out that going
> from Excel to R (when doing this carefully). Using the clipboard is
> actually quite fast and efficient (2 seconds for transferring 120 000
> cells on a common desktop computer as compared to much longer for going
> the RODBC route, maybe also substantially longer using the R(D)COM
> route). Other advantage: Relatively flexible handling of missing values
> from Excel which may be "empty", "#N/A", etc. 

You clearly haven't looked at RODBC carefully as that has equally flexible 
handling.

> So I thought I would also look at ways of going back via the clipboard.
> There I'm hitting a funny snag. The documentation explains that there is
> a 32KB limit on writing to the clipboard. This may make sense as a
> default, but does it make sense as a hard restriction?

I guess you are talking about using file="clipboard" in a connection, but
there is also writeClipboard, whose documentation does not mention a 
limit.

My Windows documentation says there is a 32Kb limit on the size of an 
text object put on the clipboard, and I believe that is the case for
Windows 95/98/ME at least.  So a hard restriction makes sense.

> Any ideas of getting around this limitation (besides cumbersome R-code
> buffering/breaking up the frame to send into small pieces to be
> collected and assembled in Excel?

R is Open Source, so this is an opportunity for you to experiment, test on 
various versions of Windows and contribute a patch back to the R project.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list