[R] Not all rows are being read-in

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Wed Mar 30 15:30:28 CEST 2011


Philipp, you are a savior!
That's exactly what has been happening - and it was driving me crazy.
quote="" fixed things.
Thank you very much!
Dimitri

On Wed, Mar 30, 2011 at 5:01 AM, Philipp Pagel <p.pagel at wzw.tum.de> wrote:
> On Tue, Mar 29, 2011 at 06:58:59PM -0400, Dimitri Liakhovitski wrote:
>> I have a tab-delimited .txt file (size 800MB) with about 3.4 million
>> rows and 41 columns. About 15 columns contain strings.
>> Tried to read it in in R 2.12.2 on a laptop that has Windows XP:
>> mydata<-read.delim(file="FileName.TXT",sep="\t")
>> R did not complain (!) and I got: dim(mydata) 1692063 41.
>
> My guess would be that there are (unexpected) quotes and/or double quotes in your
> file and so R thinks that rather large blocks of your file are
> actually very long strings. This routinely happens in situations like
> this:
>
> ID      x   description
> 1     0.4   my first measurement
> 2     1.6   Normal 5" object
> 3     0.4   Some measuremetn
> 4     0.7   A 4" long sample
>
> R thinks that the description in row 2 ends in row 4 and you loose
> data.
>
> Try read.delim(..., quote="").
>
> cu
>        Philipp
>
> --
> Dr. Philipp Pagel
> Lehrstuhl für Genomorientierte Bioinformatik
> Technische Universität München
> Wissenschaftszentrum Weihenstephan
> Maximus-von-Imhof-Forum 3
> 85354 Freising, Germany
> http://webclu.bio.wzw.tum.de/~pagel/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com



More information about the R-help mailing list