[R] replacing ugly for loops

Bert Gunter gunter.berton at gene.com
Thu Oct 11 07:59:29 CEST 2012


I am not sure you have expressed what you wanjt to do correctly. See inline:

On Wed, Oct 10, 2012 at 9:10 PM, andrewH <ahoerner at rprogress.org> wrote:
> I have a couple of hundred American Community Survey Summary Files files
> containing rectangular arrays of data, mainly though not exclusively
> numeric.  Each file is referred to as a sequence (henceforth "seq").
-- so 1 "seq" (terrible identifier -- see below for why) = 1 file

 From
> these files I am trying to extract particular subsets (tables) consisting of
> a sets of columns.  These tables are defined by three numbers (now in
> columns in a data frame):
> 1.      a file identifier (seq)
> 2.      first column position numbers (startNo)
> 3.      length of table (len)

So your data frame, call it yourframe, has columns named:

seq      startNo       len


> so the columns to select for one triple would consist of
> startNo:(startNo+length-1).   I am trying to create for each sequence a
> vector of all the column numbers for tables in that sequence.

So for each seq id you want to find all the column numbers, right?

sq.n <- seq_len(nrow(yourframe)) ## Just to make it easier to read
colms <-  tapply(sq.n, yourframe$seq,function(x) with(yourframe[x,],
   sort(unique(do.call(c, mapply(seq, from=startNo,
length=len,SIMPLIFY = FALSE)))))

## Comments
In the mapply call, seq is the R function, ?seq.  That's why using it
as a name for a file id is terrible -- it causes confusion.

In the absence of data, this is untested -- and probably not quite
right. But it should be close, I hope. The key idea is the use of
mapply to get the sequence of columns for each row in all the rows for
each seq id. The SIMPLIFY = FALSE guarantees that this yields a list
of vectors of column indices, which are then glopped together and
cleaned up by the sort(unique(do.call(  ...  stuff.

colms should then be a list giving the sorted column numbers to choose
for each "seq" id.

I do not know whether (once cleaned up,) this is either more elegant
or more efficient than what you proposed. And I wouldn't be surprised
if someone like Bill Dunlap comes up with a lot better way, either.
But it is different -- and perhaps amusing.

... If I have properly understood what you wanted. If not, ignore all.

Cheers,
Bert

>
> Obviously I could do this with nested for loops,e.g..
>
>> seq <- c(1,1,2,2)
>> startNo  <- c(3, 10, 3, 15)
>> len <- c(4, 2, 5, 3)
>> data.df <- data.frame(seq, startNo, len)
>>
>> seq.f <- factor(data.df$seq)
>> data.l <- split(data.df, seq.f)
>> selectColsList<- vector("list", length(levels(seq.f)))
>> for (i in seq_along(levels(seq.f))){
>    selectCols <- numeric()
>        for (j in seq_along(data.l[[i]]$startNo)){
>            selectCols <- c(selectCols,
> data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j]
>            data.l[[i]]$len[j]-1))
>         }
>     selectColsList[[i]] <- selectCols
> }
>> selectColsList
> [[1]]
> [1]  3  4  5  6 10 11
> [[2]]
> [1]  3  4  5  6  7 15 16 17
>
> But this code strikes me as inelegant and verbose. It seems to me that there
> ought to be a way to make the outer loop, (indexed with i) into a tapply
> function (which is why I started with a split()), and the inner loop
> (indexed with j) into some cute recursive function, but I was not able to do
> so. If anyone could suggest some nicer (e.g. shorter, or faster, or just
> more sophisticated) way to do this instead, I would be most grateful.
>
> Sincerely, andrewH
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




More information about the R-help mailing list