[R] expected behavior when parsing lines with special characters

jim holtman jholtman at gmail.com
Tue Feb 15 18:28:39 CET 2011


Check out the arguments for read.table especially 'quote'

you probably want quote=''  to suppress the special meaning of quote.
You might also need comment.char in the future.

On Tue, Feb 15, 2011 at 12:21 PM, Robert M. Flight <rflight79 at gmail.com> wrote:
> Say I have a tab-delimited table I want to read into R. What should I
> expect to happen if some of the entries contain the character " ' "? I
> thought it would read the file fine, but that is not what happens.
> Instead, all the values in between two " ' "s get read into one field,
> and things are just seriously messed up. Is this a bug, and besides
> removing the offending characters, is there a fix?
>
> Example Input file:
>
> testFile.txt:
> 3499    9031    424823  COP'B2  118094989       XP_422637.2
> 3499    7955    114454  copb2   50080158        NP_001001940.1
> 3499    7227    45757   betaCop 24584107        NP_524836.2
> 3499    7165    1278426 AgaP_AGAP004798 158297839       XP_318012.4
> 3499    6239    177779  F38E11.5        17540286        NP_501671.1
> 3499    4896    2540050 sec'27  19113604        NP_596811.1
> 3499    4932    852740  SEC27   6321301 NP_011378.1
> 3499    28985   2897447 KLLA0B01958g    50303353        XP_451618.1
> 3499    33169   4621659 AGOS_AFL118W    45198403        NP_985432.1
> 3499    148305  2682116 MGG_10504       145615762       XP_366285.2
> 3499    5141    2709504 NCU07319.1      32414251        XP_327605.1
> 3499    3702    820842  AT3G15980       30683862        NP_850592.1
> 3499    3702    841666  AT1G52360       15218215        NP_175645.1
> 3499    3702    844339  AT1G79990       30699476        NP_178116.2
> 3499    4530    4340097 Os06g0143900    115466360       NP_001056779.1
>
> testDat <- read.table('testFile.txt',sep='\t')
> testDat
>
>     V1     V2      V3
> 1  3499   9031  424823
> 2  3499   4932  852740
> 3  3499  28985 2897447
> 4  3499  33169 4621659
> 5  3499 148305 2682116
> 6  3499   5141 2709504
> 7  3499   3702  820842
> 8  3499   3702  841666
> 9  3499   3702  844339
> 10 3499   4530 4340097
>
>
>
>                                       V4
> 1  COPB2\t118094989\tXP_422637.2\n3499\t7955\t114454\tcopb2\t50080158\tNP_001001940.1\n3499\t7227\t45757\tbetaCop\t24584107\tNP_524836.2\n3499\t7165\t1278426\tAgaP_AGAP004798\t158297839\tXP_318012.4\n3499\t6239\t177779\tF38E11.5\t17540286\tNP_501671.1\n3499\t4896\t2540050\tsec27
> 2
>
>
>                                    SEC27
> 3
>
>
>                             KLLA0B01958g
> 4
>
>
>                             AGOS_AFL118W
> 5
>
>
>                                MGG_10504
> 6
>
>
>                               NCU07319.1
> 7
>
>
>                                AT3G15980
> 8
>
>
>                                AT1G52360
> 9
>
>
>                                AT1G79990
> 10
>
>
>                             Os06g0143900
>          V5             V6
> 1   19113604    NP_596811.1
> 2    6321301    NP_011378.1
> 3   50303353    XP_451618.1
> 4   45198403    NP_985432.1
> 5  145615762    XP_366285.2
> 6   32414251    XP_327605.1
> 7   30683862    NP_850592.1
> 8   15218215    NP_175645.1
> 9   30699476    NP_178116.2
> 10 115466360 NP_001056779.1
>
> I would appreciate any feedback.
>
> Thanks,
>
> -Robert
>
>> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_2.12.1
>
>
> Robert M. Flight, Ph.D.
> University of Louisville Bioinformatics Laboratory
> University of Louisville
> Louisville, KY
>
> PH 502-852-1809 (HSC)
> PH 502-852-0467 (Belknap)
> EM robert.flight at louisville.edu
> EM rflight79 at gmail.com
>
> Williams and Holland's Law:
>        If enough data is collected, anything may be proven by
> statistical methods.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list