[R] help usin scan on large matrix (caveats to what has been discussed before)

Thu Aug 12 16:52:07 CEST 2010

Hi Peter,
apologies, too fast copying and pasting.
So, here is the explanation:
f<-"C:/test/mytab.txt";
R<-readLines(con=f);

where mytab.txt is a table formatted as noted in previous post (space 
delimited, with header, rownames, containing integers).

Now, my understandign of scan was that I have to specify the FULL number 
of values in it (examples specify things like 200*2000 for a matrix 
etc). That's why I thought that I need to do cols*rows as well. Avoiding 
the first line with headers is simple, avoiding the first column is not 
- hence my questions.
Sorry, the corrected, matching parentheses are here - why did the 
previous execute is a wonder...
c<-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1), skip=1)
here, my reasoning was:

* c(list(NULL),rep(list(0L),cols-1)) specifies a template for any line 
(first elelement to be ignored => NULL, it is a string in the table 
specified, and then a repetition of integers - I am still not sure how 
you derived 0L, and what it means and where to find a doc for that.);
* the previous needs to be repeated rows-1 times, hence 
what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1)

I do nto understand the following:

  You need an unlist(c). And more than likely NOT byrow=TRUE. However, I think do.call(cbind,c) should do the trick more easily.

what will unlist(c) do; why should it not be bywrow=TRUE, and how would 
you go about integrating do.call(cbind,c) with matrix. Apologies to 
naive questions, I am a newbie, in principle.

Cheers
Martin

On 8/12/2010 4:29 PM, peter dalgaard wrote:
> On Aug 12, 2010, at 1:34 PM, Martin Tomko wrote:
>
>    
>> Hi Peter,
>> thank you for your reply. I still cannot get it to work.
>> I have modified your code as follows:
>> rows<-length(R)
>> cols<- max(unlist(lapply(R,function(x) length(unlist(gregexpr(" ",x,fixed=TRUE,useBytes=TRUE))))))
>>      
> Notice that the above is completely useless to the reader unless you tell us what R is (except for a statistical programming language ;-))
>
>    
>> c<-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1),rows-1)), skip=1)
>>      
> What's the outer rep() and rows-1 doing in there???! Notice that the parentheses don't match up as I think you think they do, so there's really only one argument to rep(), making it a no-op. The rows-1 is going inside the c, which might be causing the apparent extra column. And the number of rows should not affect 'what=' anyway. Now if you had done what I wrote...
>
>    
>> m<-matrix(c, nrow = rows-1, ncol=cols+1,byrow=TRUE);
>>      
> If you make a matrix from a list, odd things will happen. You need an unlist(c). And more than likely NOT byrow=TRUE. However, I think do.call(cbind,c) should do the trick more easily.
>
>    
>> the list c seems ok, with all the values I would expect. Still, length(c) gives me a value = cols+1, which I find odd (I would expect =cols).
>> I thine repeated it rows-1 times (to account for the header row). The values seem ok.
>> Anyway, I tried to construct the matrix, but when I print it, the values are odd:
>>      
>>> m[1:10,1:10]
>>>        
>>       [,1] [,2]       [,3]       [,4]       [,5]       [,6]       [,7]
>> [1,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [2,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [3,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [4,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [5,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [6,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [7,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [8,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [9,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> [10,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
>> ....
>>
>> Any idea where the values are gone?
>> Thanks
>> Martin
>>
>> Hence, I filled it into the matrix of dimensions
>>
>> On 8/12/2010 12:24 PM, peter dalgaard wrote:
>>      
>>> On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:
>>>
>>>
>>>        
>>>> c<-scan(file=f,what=list(c("",(rep(integer(0),cols)))), skip=1)
>>>> m<-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);
>>>>
>>>> for some reason I end up with a character matrix, which I don't want. Is this the proper way to skip the first column (this is not documented anywhere - how does one skip the first column in scan???). is my way of specifying "integer(0)" correct?
>>>>
>>>>          
>>> No. Well, integer(0) is just superfluous where 0L would do, since scan only looks at the types not the contents, but more importantly, what= wants a list of as many elements as there are columns and you gave it
>>>
>>>
>>>        
>>>> list(c("",(rep(integer(0),5))))
>>>>
>>>>          
>>> [[1]]
>>> [1] ""
>>>
>>> I think what you actually meant was
>>>
>>> c(list(NULL),rep(list(0L),5))
>>>
>>>
>>>
>>>
>>>        
>>>> And finally - would any sparse matrix package be more appropriate, and can I use a sparse matrix for the image() function producing typical heat,aps? I have seen that some sparse matrix packages produce different looking outputs, which would not be appropriate.
>>>>
>>>> Thanks
>>>> Martin
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>          
>>>
>>>        
>>
>> -- 
>> Martin Tomko
>> Postdoctoral Research Assistant
>>
>> Geographic Information Systems Division
>> Department of Geography
>> University of Zurich - Irchel
>> Winterthurerstr. 190
>> CH-8057 Zurich, Switzerland
>>
>> email: 	martin.tomko at geo.uzh.ch
>> site:	http://www.geo.uzh.ch/~mtomko
>> mob: 	+41-788 629 558
>> tel: 	+41-44-6355256
>> fax: 	+41-44-6356848
>>
>>      
>    

-- 
Martin Tomko
Postdoctoral Research Assistant

Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email: 	martin.tomko at geo.uzh.ch
site:	http://www.geo.uzh.ch/~mtomko
mob: 	+41-788 629 558
tel: 	+41-44-6355256
fax: 	+41-44-6356848