[BioC] modify colClasses in read.columns?

Wolfgang Huber huber at ebi.ac.uk
Sun Apr 27 23:20:48 CEST 2008


Dear Henrik,

with a file test.txt as follows:

A	B	C
1	4711	34.50
2	ZAZA	01.40

and the call

z=read.table("test.txt", colClasses=c("integer", "NULL", "character"),
            header=TRUE, sep="\t")

I get

 > str(z)
'data.frame':   2 obs. of  2 variables:
  $ A: int  1 2
  $ C: chr  "34.50" "01.40"


so maybe the functionality you wish is already provided by read.table?

 From looking at its code and man page, I don't think read.columns is 
designed to accept user input for what it takes as colClasses. In fact, 
when I try to supply colClasses to read.columns, I get:

Errore in read.table(file = file, header = TRUE, col.names = allcnames:
   l'argumento formale "colClasses" è associato a diversi argomenti passati

   Best wishes
	Wolfgang



 > sessionInfo()
R version 2.8.0 Under development (unstable) (2008-04-27 r45517)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] fortunes_1.3-4


------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber


Henrik Parn a écrit 25/04/2008 21:21:
> Dear Herve,
> 
> Thanks for your rapid answer!
> 
> Sorry, I forgot to paste the sessionInfo into my previous mail:
> 
>  > sessionInfo()
> R version 2.7.0 (2008-04-22)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
> Kingdom.1252;LC_MONETARY=English_United 
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base    
> 
> other attached packages:
> [1] coda_0.13-1       limma_2.13.8      lme4_0.99875-9    
> Matrix_0.999375-9 lattice_0.17-6  
> 
> loaded via a namespace (and not attached):
> [1] grid_2.7.0  tools_2.7.0
>  > sessionInfo()
> 
> 
> The read.columns function is a part of the limma package in Bioconductor:
> source("http://bioconductor.org/biocLite.R")
> biocLite("limma")
> 
> I would like to use the read.columns function to read a subset of 
> columns from several data files. Here is some example columns (out of 
> many) and rows of the data:
> 
> ID i          ID j          Ni   Nj   S   A    R1        B     R2        
> C      R3        D     R4   
> 8414341.20    8414342.20    1    2    -1  1    0.425183  1     0.758413  
> 1      0.551275  1     0.543045
> 8414341.20    8414343.20    1    3    -1  1    0.128981  1     0.034859  
> 1     -0.001998  1     0.002093
> 
> In this example, there are 13 tab-delimited columns of which I want to 
> use only ID i, ID i, R1, R2, R3 and R4. The problem with the data in its 
> current form is the unfortunate format of the ID i and ID j columns: I 
> need ID i and ID j to be treated as characters although they look like 
> numeric (if they are read as numeric the .20 will become a .2). When I 
> have used read.table(), I have first read all columns, and by using the 
> argument colClasses = c("character", "character",...), I have preserved 
> the format of ID i and ID j. In the next step I have selected only the 
> relevant columns.
> 
> I thought read.columns could be a convenient alternative to select only 
> the relevant columns when reading the data, by using e.g. required.col = 
> c("ID i", "ID j"), text.to.search = "R". However, in read.columns I 
> cannot specify colClasses. As it says in the help text "It uses 
> |required.col| and |text.to.search| to set up the |colClasses| argument 
> of |read.table|.". So, I wonder anyone could advice me on how to modify 
> the read.columns code to be able to specify colClasses, if it is not to 
> complicated.
> 
> Thanks in advance!
> 
> 
> Henrik    
> 
> 
> 
> Herve Pages wrote:
> 
>> Hi Henrik,
>>
>> I don't have read.columns() when I start a fresh R session so it looks 
>> like it's
>> not part of the default R installation. Which package does it belong to?
>> Providing your sessionInfo() is always a good idea as it would at 
>> least give
>> us a clue of where to look for the read.columns() function. Also a 
>> small example
>> (with code) of what you are trying to do would be very useful.
>>
>> Thanks!
>> H.
>>
>>
>> Henrik Parn wrote:
>>
>>> Dear all,
>>>
>>> I have received some data sets with some variables that certainly 
>>> looks like numeric: they are individual IDs that are composed of some 
>>> numbers separated by ".", e.g. 6534231.18, 8783234.20. Not 
>>> surprisingly they are treated as numeric by read.columns, and 
>>> 8783234.20 ends up like 8783234.2 when read to R. When I used 
>>> read.table I specified in colClasses that these variables should be 
>>> read as |characters. However, in read.columns| |required.col| and 
>>> |text.to.search| is used to set up the |colClasses| argument of 
>>> |read.table|.| Does anyone have a suggestion of how I can modify the 
>>> read.columns function so I can specify the colClasses myself?
>>>
>>> Thanks in advance!   |
>>>
>



More information about the Bioconductor mailing list