[R] Get rid of space padding

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Thu Dec 23 09:46:17 CET 2004


On 23-Dec-04 Gene Cutler wrote:
> 
> On Dec 22, 2004, at 5:00 PM, (Ted Harding) wrote:
>>
>> so, for me, the tabs are coming through as such.
>>
>> (R-1.8.0, RH9 Linux)
>>
>> What gives you the information that "\t" has expanded to spaces?
>> Often, writing a file out to a display, or importing it into an
>> editor (though you should be able to turn this off) expands tabs
>>
> 
> 
> I am getting tabs, but the values are being padded out to the tabs.
> Here is one sample line with spaces replaced by '.' and tabs replaced
> by '^'.
> 
> 5...................^45597241............^16734145............^2.7128169
> 8016131e-06^0.622804755173039...^0.91743119266055....^GB-4858-1- 
> A.........^GB-4873-1-A.........
> 
> This is with R 2.0.1 both on Mandrake linux and Mac OS X.
> 
> Also, I know it's not an issue of a text editor altering the data as  
> these file get read in by perl scripts and R as well as my text editor
> of choice.

OK Gene, getting a clue here. I can reproduce similar behaviour
using your version of write.matrix by seting some elements to
character variables:

  x<-matrix(rnorm(30),ncol=3)
  x[1,1]<-"A"
  x[2,2]<-"B"
  x[3,3]<-"C"
  write.matrix(x,file="temp.write")

A                       0.0855822398994265      1.02493287358937  
2.17769486851001        B                       -0.310203876654049
-1.46891720382270       -0.756931913255919      C                 
0.177454935470461       -1.06532248163526       -0.413338129170855

and 'od -c temp.write' gives:

0000000   A                                                            
0000020          \t   0   .   0   8   5   5   8   2   2   3   9   8   9
0000040   9   4   2   6   5  \t   1   .   0   2   4   9   3   2   8   7
0000060   3   5   8   9   3   7          \n   2   .   1   7   7   6   9
0000100   4   8   6   8   5   1   0   0   1          \t   B            
0000120                                                          \t   -
0000140   0   .   3   1   0   2   0   3   8   7   6   6   5   4   0   4
0000160   9  \n   -   1   .   4   6   8   9   1   7   2   0   3   8   2
0000200   2   7   0      \t   -   0   .   7   5   6   9   3   1   9   1
0000220   3   2   5   5   9   1   9  \t   C                            
0000240                                          \n   0   .   1   7   7
0000260   4   5   4   9   3   5   4   7   0   4   6   1      \t   -   1
0000300   .   0   6   5   3   2   2   4   8   1   6   3   5   2   6    
0000320  \t   -   0   .   4   1   3   3   3   8   1   2   9   1   7   0
0000340   8   5   5  \n   -   0   .   8   2   4   0   5   9   9   6   4

where, as in your example, "short" results are padded out to
the position of the next tab with spaces.

However, when (as I suggested last time) I modify your function
'write.matrix' so as to remove occurrences of "format("...")"
(leaving only  the ... ) then it seems to be OK.

Now the first few lines of 'cat temp.write' are

A       0.0855822398994265      1.02493287358937
2.17769486851001        B       -0.310203876654049
-1.46891720382270       -0.756931913255919      C
0.177454935470461       -1.06532248163526       -0.413338129170855

and 'od -c temp.write' gives

0000000   A  \t   0   .   0   8   5   5   8   2   2   3   9   8   9   9
0000020   4   2   6   5  \t   1   .   0   2   4   9   3   2   8   7   3
0000040   5   8   9   3   7  \n   2   .   1   7   7   6   9   4   8   6
0000060   8   5   1   0   0   1  \t   B  \t   -   0   .   3   1   0   2
0000100   0   3   8   7   6   6   5   4   0   4   9  \n   -   1   .   4
0000120   6   8   9   1   7   2   0   3   8   2   2   7   0  \t   -   0
0000140   .   7   5   6   9   3   1   9   1   3   2   5   5   9   1   9
0000160  \t   C  \n   0   .   1   7   7   4   5   4   9   3   5   4   7
0000200   0   4   6   1  \t   -   1   .   0   6   5   3   2   2   4   8
0000220   1   6   3   5   2   6  \t   -   0   .   4   1   3   3   3   8

so that all the spaces have now disappeared, leaving only tabs.

The revised definition of "write.matrix" is:

write.matrix <- function (x, file = "", sep = "\t", blocksize=2000)
{
     x <- as.matrix(x)
     p <- ncol(x)
     cn <- colnames(x)
     if (!missing(blocksize) && blocksize > 0) { 
         cat(cn, file = file, sep = c(rep(sep, p - 1), "\n"))
         nlines <- 0 
         nr <- nrow(x)
         while (nlines < nr) { 
             nb <- min(blocksize, nr - nlines)
             cat(t(x[nlines + (1:nb), ]), file = file,
                 append = TRUE, sep = c(rep(sep, p - 1), "\n"))
             nlines <- nlines + nb
         }
     } 
     else cat(c(cn, t(x)), file = file,
              sep = c(rep(sep, p - 1), "\n"))
}

Hence, back to my earlier question: Why do you need "format" in
your function? This is what is generating the effect which you
don't want!

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 23-Dec-04                                       Time: 08:46:17
------------------------------ XFMail ------------------------------




More information about the R-help mailing list