[R] write.table: strange output has been produced

jim holtman jholtman at gmail.com
Wed Sep 19 19:36:12 CEST 2012


It would also be helpful if you could provide the output of 'str' for
all the objects that you are using.

e.g.,  str(statdata)    str(extra)


Also in creating your data.frame, use "stringsAsFactors = FALSE":

extra = data.frame(kogdefline=rep(NA,n)
    , kogClass = rep(NA,n)
    , kogGroup = rep(NA,n)
    , stringsAsFactors = FALSE
)

On Wed, Sep 19, 2012 at 12:12 PM, Igor <igorc at essex.ac.uk> wrote:
> Good afternoon all -
>
> While making a steady progress in learning R after Matlab I encountered
> a problem which seems to require some extra help to move over.
> Basically I want to merge a data from biological statistical dataset
> with annotation data extracted from another dataset using an 'id'
> crossreference and write it to report file. The first part goes
> absolutely fine, I have merged both data into data.frame but when I try
> to write it to csv file using 'write.table' it seems like it does write
> the 'data.frame' object but it also insert some parts from the
> annotation data which are not suppose to be there...
> There is a little snapshot of the file output below to illustrate. The
> upper half is fine, that's how it should be. The lower half, which is
> actually appears to be space-separated, not coma, obviously grabbed from
> the annotation dataset and is not supposed to be here.
>
> --------------------------------8<--------------------------------------------
> "344","166128",126.44286392082,179.904700814932,72.9810270267088,0.40566492535281,-1.3016395254146,2.47449355237252e-07,4.2901159299567e-06,"Chitinas
> "18816","238247",92.5282508325735,135.981255262454,49.0752464026927,0.36089714209487,-1.47034037615176,2.5330054329543e-07,4.38862252337004e-06,"Prot
> "22072","222365",30.8191942806426,52.4262903365628,9.21209822472236,0.17571524068522,-2.50868876576414,2.54433836512085e-07,4.40531098485028e-06,NA,N
> "25062","226605",30.808007579908,50.3976662241578,11.2183489356581,0.22259659575825,-2.16749656564076,2.54934711860645e-07,4.41103467375713e-06,NA,NA
> "7539","247009",75.4175439970731,34.4643221134552,116.370765880691,3.37655751642533,1.75555313265164,2.60010673210741e-07,4.49585878338091e-06,NA,NA,
> "407","267139",425.559675915702,279.393013150954,571.72633868045,2.04631580522577,1.03302881149302,2.61074218843609e-07,4.51123710239304e-06,NA,NA,NA
> "26530","171300",146.80096060985,80.0063286553601,213.595592564339,2.66973370924738,1.4166958484644,2.68061220749976e-07,4.62888115991058e-06,NA,NA,N
> "3078","159013",34.3260176515511,52.4580790080106,16.1939562950917,0.308702808057816,-1.69570948866688,2.69104298652827e-07,4.64379716436078e-06,"40S
> "4657","159998",133.10761487064,185.450704462326,80.7645252789532,0.435504009074069,-1.19924209513405,2.75544399955331e-07,4.75176501174632e-06,"IMP-
>
> 171597  171597  KOG1347 Uncharacterized membrane protein, predicted
> efflux pump General function prediction only    POORLY CHARACTERIZED
> 171658  171658  KOG4290 Predicted membrane protein  Function unknown
> POORLY CHARACTERIZED
> 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion  Signal transduction mechanisms
> CELLULAR
> 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion  Intracellular trafficking,
> secretion, and
> 171703  171703  KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p  Cytoskeleton    CELLULAR PROCESSES AND SIGNALING
> 171703  171703  KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p  Intracellular trafficking, secretion, and vesicular
> transport   CELLU
> and metabolism     METABOLISM
> --------------------------------8<--------------------------------------------
> And this is a piece of code that produced this:
>
> --------------------------------8<--------------------------------------------
>>n = nrow(statdata)
>>extra = data.frame(kogdefline=rep(NA,n), kogClass = rep(NA,n), kogGroup
> = rep(NA,n))
>>subset = intersect(statdata$id, annot$id)
>>MR = match(subset, annot$id)
>>ML = match(subset, statdata$id)
>
>>extra[ML,1] = as.character(annot[MR,2])
>>extra[ML,2] = as.character(annot[MR,3])
>>extra[ML,3] = as.character(annot[MR,4])
> # strangely, if I do
> # extra[ML,] = as.character(annot[MR,2:4])
> # it produces digits (???) instead of a string value
>
>>mergedData = data.frame(statdata, extra)
>>write.table(mergedData, 'filename.csv', sep=',')
> --------------------------------8<--------------------------------------------
>
> Any ideas why this is happening?
>
> Many thanks
> -Igor
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.




More information about the R-help mailing list