[R] How to read only specified columns from a data file

Luis Ridao luridao at gmail.com
Wed Mar 16 14:07:20 CET 2011


This is my code:

mycols <- rep(NULL, 430) ; mycols[c(1,3:5)] <- rep("numeric", 4) ;
mycols[c(2)] <- rep("character",1)
inp <- read.table(myfile, skip=2, colClasses=mycols,fill=T)
head(inp)

Best,
Luis

On Wed, Mar 16, 2011 at 1:03 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Mar 16, 2011, at 8:13 AM, Sarah Goslee wrote:
>
>> read.table() looks at the first five rows when determining how many
>> columns
>> there are. If there are more columns in row 7 and you do not specify that
>> in
>> the read.table() command directly, they will be wrapped to the next row.
>>
>> This was discussed on the list within the last couple weeks.
>
> In addition to Sarah's comments, I also not that you did not include your
> code. I don't think it could have been identical to the code I suggested,
> which was in turn based on the code you had proposed. So ... what did you do
> to get that result?
>
>
> --
> David.
>
>>
>> Sarah
>>
>> On Wed, Mar 16, 2011 at 7:54 AM, Luis Ridao <luridao at gmail.com> wrote:
>>>
>>> David,
>>>
>>> Thanks for your tip but it seems I'm having problems with the number
>>> of columns R manages to read in. Below it s an example of the data read
>>> in:
>>>
>>>> inp[1:20,]
>>>
>>>       V1          V2        V3       V4     V5     V6     V7     V8
>>> V9
>>> 1   1.0000 log_fy_coff -1.007600 0.119520 1.0000     NA            NA
>>> NA
>>> 2   2.0000 log_fy_coff -0.935010 0.112840 0.8896 1.0000            NA
>>> NA
>>> 3   3.0000 log_fy_coff -0.876260 0.107500 0.8219 0.8847 1.0000     NA
>>> NA
>>> 4   4.0000 log_fy_coff -0.683090 0.103030 0.7656 0.8143 0.8747 1.0000
>>> NA
>>> 5   5.0000 log_fy_coff -0.623500 0.100980 0.7206 0.7636 0.8086 0.8764
>>> 1.0000
>>> 6   6.0000 log_fy_coff -0.583330 0.098978 0.6819 0.7214 0.7615 0.8150
>>> 0.8762
>>> 7   1.0000                    NA       NA     NA     NA            NA
>>> NA
>>> 8   7.0000 log_fy_coff -0.676790 0.096608 0.6521 0.6892 0.7254 0.7719
>>> 0.8148
>>> 9   0.8717      1.0000        NA       NA     NA     NA            NA
>>> NA
>>> 10  8.0000 log_fy_coff -0.696060 0.093761 0.6297 0.6654 0.6988 0.7405
>>> 0.7750
>>> 11  0.8116      0.8643  1.000000       NA     NA     NA            NA
>>> NA
>>> 12  9.0000 log_fy_coff -0.527060 0.089949 0.6003 0.6347 0.6667 0.7060
>>> 0.7367
>>>
>>> as you see there are only 9 columns in inp and the rest is read in in
>>> the following row(see row 7)
>>> I just don't understand why this is happening (using fill=T does not
>>> help either)
>>>
>>> Best,
>>> Luis
>>>
>>> On Tue, Mar 15, 2011 at 5:15 PM, David Winsemius <dwinsemius at comcast.net>
>>> wrote:
>>>>
>>>> On Mar 15, 2011, at 1:11 PM, <rex.dwyer at syngenta.com> wrote:
>>>>
>>>>> I think you need to read an introduction to R.
>>>>> For starters, read.table returns its results as a value, which you are
>>>>> not
>>>>> saving.
>>>>> The probable answer to your question:
>>>>> Read the whole file with read.table, and select columns you need, e.g.:
>>>>> tab <- read.table(myfile, skip=2)[,1:5]
>>>>>
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org
>>>>> [mailto:r-help-bounces at r-project.org]
>>>>> On Behalf Of Luis Ridao
>>>>> Sent: Tuesday, March 15, 2011 11:53 AM
>>>>> To: r-help at r-project.org
>>>>> Subject: [R] How to read only specified columns from a data file
>>>>>
>>>>> R-help,
>>>>>
>>>>> I'm trying to read a data file with plenty of columns.
>>>>> I just need the first 5 but it doe not work by doing something like:
>>>>>
>>>>>> mycols <- rep(NULL, 430) ; mycols[c(1:4)] <- NA
>>>>>> read.table(myfile, skip=2, colClasses=mycols)
>>>>
>>>> I would have suggested:
>>>>
>>>> mycols <- rep(NULL, 430) ; mycols[1:5] <- rep("numeric", 5)
>>>> inp <- read.table(myfile, skip=2, colClasses=mycols)
>>>> head(inp)
>>>>
>>>> --
>>>> David.
>>>>
>>>>>
>>>>> Any suggestions?
>>>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>



More information about the R-help mailing list