[R] read.table truncated data?

Petr PIKAL petr.pikal at precheza.cz
Fri Aug 26 10:22:13 CEST 2011


Hi

> 
> Thanks, Jim. quote='' works. And then I found a single quote in each of
> these lines:
> 3262
> 10403
> 17544
> 24685
> 31826
> 38967
> 
> None of them near the position the table got truncated. Why is it?
> 
> And read.table is a great function. Is it possible for it to give a 
warning
> message when the data gets truncated? In my case I almost looked over 
the
> truncation...

When I read in some big data I usually do

str(data)

which tells me if there is some problem with data types (conversion of 
numeric to factor due to any problematic item)

and/or

dim(data)

to see that size is as expected.

Regards
Petr

> 
> On Thu, Aug 25, 2011 at 11:57 AM, jim holtman <jholtman at gmail.com> 
wrote:
> 
> > But did you try the following:
> >
> > x <- read.table(...., comment.char = '', quote = '')
> >
> > Most cases is that there is a missing quote somewhere in your data.
> > use a text editor and search for single and double quotes.
> >
> > On Thu, Aug 25, 2011 at 11:49 AM, zhenjiang xu 
<zhenjiang.xu at gmail.com>
> > wrote:
> > > Thanks for your replies. I looked at those lines and didn't spot 
anything
> > > unusual.
> > >
> > >> tail(a)
> > >        test_id gene_id gene               locus sample_1 sample_2 
status
> > > 21418 tY(GUA)J1       - SUP7 chr10:354243-354332 air1rrp6 air2rrp6  
OK
> > > 21419 tY(GUA)J2       - SUP4 chr10:542955-543044 air1rrp6 air2rrp6  
OK
> > > 21420 tY(GUA)M1       - SUP5 chr13:168794-168883 air1rrp6 air2rrp6  
OK
> > > 21421 tY(GUA)M2       - SUP8 chr13:837927-838016 air1rrp6 air2rrp6  
OK
> > > 21422  tY(GUA)O       - SUP3 chr15:288191-288280 air1rrp6 air2rrp6  
OK
> > > 21423  tY(GUA)Q       -    -   chrmt:70823-70907 air1rrp6 air2rrp6
> > OK
> > >      value_1 value_2 ln.fold_change. test_stat  p_value  q_value
> > > significant
> > > 21418 0.00000  0.0000        0.000000   0.00000 1.000000 1.011650
> > >  no
> > > 21419 0.00000  0.0000        0.000000   0.00000 1.000000 1.011480
> > >  no
> > > 21420 0.00000  0.0000        0.000000   0.00000 1.000000 1.011500
> > >  no
> > > 21421 0.00000  0.0000        0.000000   0.00000 1.000000 1.011520
> > >  no
> > > 21422 0.00000  0.0000        0.000000   0.00000 1.000000 1.011550
> > >  no
> > > 21423 6.68356 10.7397        0.474301  -1.08614 0.277417 0.455917
> > >  no
> > >
> > >
> > > tY(GUA)J1       -       SUP7    chr10:354243-354332     rrp6 
air1rrp6
> > >   OK      0       0       0       0       1    1.00404  no
> > > tY(GUA)J2       -       SUP4    chr10:542955-543044     rrp6 
air1rrp6
> > >   OK      0       0       0       0       1    1.00497  no
> > > tY(GUA)M1       -       SUP5    chr13:168794-168883     rrp6 
air1rrp6
> > >   OK      0       0       0       0       1    1.00492  no
> > > tY(GUA)M2       -       SUP8    chr13:837927-838016     rrp6 
air1rrp6
> > >   OK      0       0       0       0       1    1.00488  no
> > > tY(GUA)O        -       SUP3    chr15:288191-288280     rrp6 
air1rrp6
> > >   OK      0       0       0       0       1    1.00485  no
> > > tY(GUA)Q        -       -       chrmt:70823-70907       rrp6 
air1rrp6
> > >   OK      4.49644 6.68356 0.396365        -0.766052     0.443645
> > >  0.634724        no
> > > 15S_rRNA        -       15S_RRNA        chrmt:6545-8194 WT air2rrp6
> > >   OK      2288.88 711.697 -1.16817        2.78772       0.00530801
> > >  0.0167772       yes
> > > 21S_rRNA        -       21S_RRNA        chrmt:58008-62447       WT
> > >  air2rrp6        OK      4134.59 1927.04 -0.7634 1.58991 0.111855
> > >   0.22339 no
> > > ETS1-1  -       ETS1-1  chr12:457732-458432     WT      air2rrp6
> >  OK
> > >   3258.97 1114.76 -1.07277        2.91211 0.00359       0.0121587
> > yes
> > > ETS1-2  -       ETS1-2  chr12:466869-467569     WT      air2rrp6
> >  OK
> > >   3258.97 1114.76 -1.07277        2.91211 0.00359       0.0121597
> > yes
> > >
> > >
> > > On Wed, Aug 24, 2011 at 2:34 PM, Sarah Goslee 
<sarah.goslee at gmail.com
> > >wrote:
> > >
> > >> Hi,
> > >>
> > >> On Wed, Aug 24, 2011 at 2:18 PM, zhenjiang xu 
<zhenjiang.xu at gmail.com>
> > >> wrote:
> > >> > Hi R users,
> > >> >
> > >> > I was using read.table to read a file. The data.fame looked 
alright,
> > but
> > >> I
> > >> > found not all rows are read by the read.table. What's wrong with 
it?
> > It
> > >> > didn't give me any warning or error messages. Why the data are
> > truncated?
> > >> > Thanks.
> > >> >
> > >> > $ wc -l all/isoform_exp.diff
> > >> > 42847 all/isoform_exp.diff
> > >> >
> > >> >> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
> > >> >> nrow(a)
> > >> > [1] 21423
> > >>
> > >> This is a common problem. You need to take a look at the last row 
that
> > >> was imported, and the rows around 21423 in the original file.
> > >>
> > >> Common causes include stray single or double quotation marks, and
> > >> other special characters in your file like the default comment.char 
#
> > >>
> > >> Sarah
> > >> --
> > >> Sarah Goslee
> > >> http://www.functionaldiversity.org
> > >>
> > >
> > >
> > >
> > > --
> > > Best,
> > > Zhenjiang
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> >
> 
> 
> 
> -- 
> Best,
> Zhenjiang
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list