[R] How to read only specified columns from a data file

Sarah Goslee sarah.goslee at gmail.com
Wed Mar 16 14:19:39 CET 2011


On Wed, Mar 16, 2011 at 9:07 AM, Luis Ridao <luridao at gmail.com> wrote:
> This is my code:
>
> mycols <- rep(NULL, 430) ; mycols[c(1,3:5)] <- rep("numeric", 4) ;
> mycols[c(2)] <- rep("character",1)

rep(NULL, 430) does not give you a vector of length 430; it gives you a NULL
vector, and at the end of this process mycols is of length 5.

So read.table() does exactly what you've told it, and reads in the columns as
calculated from the first five rows, and gives the first five columns
the classes
specified in mycols.

According to the documentation for read.table(), you want "NULL" rather
than NULL anyway, and rep("NULL", 430) should work as expected.

Sarah

> inp <- read.table(myfile, skip=2, colClasses=mycols,fill=T)
> head(inp)
>
> Best,
> Luis
>
> On Wed, Mar 16, 2011 at 1:03 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>> On Mar 16, 2011, at 8:13 AM, Sarah Goslee wrote:
>>
>>> read.table() looks at the first five rows when determining how many
>>> columns
>>> there are. If there are more columns in row 7 and you do not specify that
>>> in
>>> the read.table() command directly, they will be wrapped to the next row.
>>>
>>> This was discussed on the list within the last couple weeks.
>>
>> In addition to Sarah's comments, I also not that you did not include your
>> code. I don't think it could have been identical to the code I suggested,
>> which was in turn based on the code you had proposed. So ... what did you do
>> to get that result?
>>
>>
>> --
>> David.
>>
>>>
>>> Sarah
>>>
>>> On Wed, Mar 16, 2011 at 7:54 AM, Luis Ridao <luridao at gmail.com> wrote:
>>>>
>>>> David,
>>>>
>>>> Thanks for your tip but it seems I'm having problems with the number
>>>> of columns R manages to read in. Below it s an example of the data read
>>>> in:
>>>>
>>>>> inp[1:20,]
>>>>
>>>>       V1          V2        V3       V4     V5     V6     V7     V8
>>>> V9
>>>> 1   1.0000 log_fy_coff -1.007600 0.119520 1.0000     NA            NA
>>>> NA
>>>> 2   2.0000 log_fy_coff -0.935010 0.112840 0.8896 1.0000            NA
>>>> NA
>>>> 3   3.0000 log_fy_coff -0.876260 0.107500 0.8219 0.8847 1.0000     NA
>>>> NA
>>>> 4   4.0000 log_fy_coff -0.683090 0.103030 0.7656 0.8143 0.8747 1.0000
>>>> NA
>>>> 5   5.0000 log_fy_coff -0.623500 0.100980 0.7206 0.7636 0.8086 0.8764
>>>> 1.0000
>>>> 6   6.0000 log_fy_coff -0.583330 0.098978 0.6819 0.7214 0.7615 0.8150
>>>> 0.8762
>>>> 7   1.0000                    NA       NA     NA     NA            NA
>>>> NA
>>>> 8   7.0000 log_fy_coff -0.676790 0.096608 0.6521 0.6892 0.7254 0.7719
>>>> 0.8148
>>>> 9   0.8717      1.0000        NA       NA     NA     NA            NA
>>>> NA
>>>> 10  8.0000 log_fy_coff -0.696060 0.093761 0.6297 0.6654 0.6988 0.7405
>>>> 0.7750
>>>> 11  0.8116      0.8643  1.000000       NA     NA     NA            NA
>>>> NA
>>>> 12  9.0000 log_fy_coff -0.527060 0.089949 0.6003 0.6347 0.6667 0.7060
>>>> 0.7367
>>>>
>>>> as you see there are only 9 columns in inp and the rest is read in in
>>>> the following row(see row 7)
>>>> I just don't understand why this is happening (using fill=T does not
>>>> help either)
>>>>
>>>> Best,
>>>> Luis
>>>>
>>>> On Tue, Mar 15, 2011 at 5:15 PM, David Winsemius <dwinsemius at comcast.net>
>>>> wrote:
>>>>>
>>>>> On Mar 15, 2011, at 1:11 PM, <rex.dwyer at syngenta.com> wrote:
>>>>>
>>>>>> I think you need to read an introduction to R.
>>>>>> For starters, read.table returns its results as a value, which you are
>>>>>> not
>>>>>> saving.
>>>>>> The probable answer to your question:
>>>>>> Read the whole file with read.table, and select columns you need, e.g.:
>>>>>> tab <- read.table(myfile, skip=2)[,1:5]
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: r-help-bounces at r-project.org
>>>>>> [mailto:r-help-bounces at r-project.org]
>>>>>> On Behalf Of Luis Ridao
>>>>>> Sent: Tuesday, March 15, 2011 11:53 AM
>>>>>> To: r-help at r-project.org
>>>>>> Subject: [R] How to read only specified columns from a data file
>>>>>>
>>>>>> R-help,
>>>>>>
>>>>>> I'm trying to read a data file with plenty of columns.
>>>>>> I just need the first 5 but it doe not work by doing something like:
>>>>>>
>>>>>>> mycols <- rep(NULL, 430) ; mycols[c(1:4)] <- NA
>>>>>>> read.table(myfile, skip=2, colClasses=mycols)
>>>>>
>>>>> I would have suggested:
>>>>>
>>>>> mycols <- rep(NULL, 430) ; mycols[1:5] <- rep("numeric", 5)
>>>>> inp <- read.table(myfile, skip=2, colClasses=mycols)
>>>>> head(inp)
>>>>>
>>>>> --
>>>>> David.
>>>>>
>>>>>>
>>>>>> Any suggestions?
>>>>>>
>>>
>>> --

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list