[BioC] Fastest way to read CSV files

Stijn van Dongen stijn at ebi.ac.uk
Fri Aug 20 15:36:36 CEST 2010


sorry, this:

>         <----integer 1 ---> <--- integer 2 --->
> 0000000 0014 0000 4268 0000 0000 0000 c000 4070

should have been:

        <-int 1-> <-int 2->
0000000 0014 0000 4268 0000 0000 0000 c000 4070


> Thanks Misha, that's very instructive.
> I'd like to add that this can be made quite parametrizable, in that it is
> possible to write and read the dimensions of the object as well. In fact, by
> writing some kind of 'cookie' number it would be possible to have code that can
> recognize what *type* of data it needs to read.  In the example below however,
> just the dimensions are first written to and then read from file. When reading,
> the dimensions are no longer hardcoded, but read from the same connection.
> 
>    x <- matrix(floor(runif(1.7e4 * 20)*1000),nr=20)
>    cn <- file("test.bin","wb")
>    writeBin(dim(x), cn)
>    writeBin(as.vector(x), cn)
>    close(cn)
> 
>    cn <- file("test.bin", "rb")
>    dims <- readBin(cn, integer(), 2)
>    x2 <- matrix(readBin(cn,numeric(), dims[1] * dims[2]), nrow=dims[1], ncol=dims[2])
>    close(cn)
> 
>    sum(x != x2)
> 
> a hex dump of the file test.bin gives this for the first line:
> 
>         <----integer 1 ---> <--- integer 2 --->
> 0000000 0014 0000 4268 0000 0000 0000 c000 4070
> 
> indeed, hexadecimal 0x14 == 20 and hexadecimal 4268 == 17000,
> this on a little endian machine.

-- 
Stijn van Dongen         >8<        -o)   O<  forename pronunciation: [Stan]
EMBL-EBI                            /\\   Tel: +44-(0)1223-492675
Hinxton, Cambridge, CB10 1SD, UK   _\_/   http://micans.org/stijn



More information about the Bioconductor mailing list