[BioC] modify colClasses in read.columns?

Henrik Parn henrik.parn at bio.ntnu.no
Fri Apr 25 22:21:23 CEST 2008


Dear Herve,

Thanks for your rapid answer!

Sorry, I forgot to paste the sessionInfo into my previous mail:

 > sessionInfo()
R version 2.7.0 (2008-04-22)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
Kingdom.1252;LC_MONETARY=English_United 
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] coda_0.13-1       limma_2.13.8      lme4_0.99875-9    
Matrix_0.999375-9 lattice_0.17-6  

loaded via a namespace (and not attached):
[1] grid_2.7.0  tools_2.7.0
 > sessionInfo()


The read.columns function is a part of the limma package in Bioconductor:
source("http://bioconductor.org/biocLite.R")
biocLite("limma")

I would like to use the read.columns function to read a subset of 
columns from several data files. Here is some example columns (out of 
many) and rows of the data:

ID i          ID j          Ni   Nj   S   A    R1        B     R2        
C      R3        D     R4   
8414341.20    8414342.20    1    2    -1  1    0.425183  1     0.758413  
1      0.551275  1     0.543045
8414341.20    8414343.20    1    3    -1  1    0.128981  1     0.034859  
1     -0.001998  1     0.002093

In this example, there are 13 tab-delimited columns of which I want to 
use only ID i, ID i, R1, R2, R3 and R4. The problem with the data in its 
current form is the unfortunate format of the ID i and ID j columns: I 
need ID i and ID j to be treated as characters although they look like 
numeric (if they are read as numeric the .20 will become a .2). When I 
have used read.table(), I have first read all columns, and by using the 
argument colClasses = c("character", "character",...), I have preserved 
the format of ID i and ID j. In the next step I have selected only the 
relevant columns.

I thought read.columns could be a convenient alternative to select only 
the relevant columns when reading the data, by using e.g. required.col = 
c("ID i", "ID j"), text.to.search = "R". However, in read.columns I 
cannot specify colClasses. As it says in the help text "It uses 
|required.col| and |text.to.search| to set up the |colClasses| argument 
of |read.table|.". So, I wonder anyone could advice me on how to modify 
the read.columns code to be able to specify colClasses, if it is not to 
complicated.

Thanks in advance!


Henrik    



Herve Pages wrote:

> Hi Henrik,
>
> I don't have read.columns() when I start a fresh R session so it looks 
> like it's
> not part of the default R installation. Which package does it belong to?
> Providing your sessionInfo() is always a good idea as it would at 
> least give
> us a clue of where to look for the read.columns() function. Also a 
> small example
> (with code) of what you are trying to do would be very useful.
>
> Thanks!
> H.
>
>
> Henrik Parn wrote:
>
>> Dear all,
>>
>> I have received some data sets with some variables that certainly 
>> looks like numeric: they are individual IDs that are composed of some 
>> numbers separated by ".", e.g. 6534231.18, 8783234.20. Not 
>> surprisingly they are treated as numeric by read.columns, and 
>> 8783234.20 ends up like 8783234.2 when read to R. When I used 
>> read.table I specified in colClasses that these variables should be 
>> read as |characters. However, in read.columns| |required.col| and 
>> |text.to.search| is used to set up the |colClasses| argument of 
>> |read.table|.| Does anyone have a suggestion of how I can modify the 
>> read.columns function so I can specify the colClasses myself?
>>
>> Thanks in advance!   |
>>
>

-- 
Henrik Pärn
Centre for Conservation Biology
Department of Biology
Norwegian University of Science and Technology
NO-7491 Trondheim
Norway

Office: +47 73596285
Fax: +47 73596100
Mobile: +47 90989255

E-mail: henrik.parn at bio.ntnu.no



More information about the Bioconductor mailing list