[R] Sliding window over irregular intervals

David Winsemius dwinsemius at comcast.net
Mon Mar 30 16:49:28 CEST 2009


The window you describe is not one I would call sliding and the  
intervals are regular with an irregular number of events within the  
windows. One way would be to use the results of trunc(pos/10000) as a  
factor with tapply:

(Related functions are floor() and round(), but your pos values appear  
to be positive, so there should not be problems with how they work  
across 0)

After creating a dataframe, dta, try something like:

 > tapply(dta$xpehh, as.factor(trunc(dta$pos/10000)), min)
      1579      1580      1581      1582
-0.153413 -0.367296  0.302555  0.090302

-- 
David Winsemius
On Mar 30, 2009, at 9:01 AM, Irene Gallego Romero wrote:

> Dear all,
>
> I have some very big data files that look something like this:
>
> id chr pos ihh1 ihh2 xpehh
> rs5748748 22 15795572 0.0230222 0.0268394 -0.153413
> rs5748755 22 15806401 0.0186084 0.0268672 -0.367296
> rs2385785 22 15807037 0.0198204 0.0186616 0.0602451
> rs1981707 22 15809384 0.0299685 0.0176768 0.527892
> rs1981708 22 15809434 0.0305465 0.0187227 0.489512
> rs11914222 22 15810040 0.0307183 0.0172399 0.577633
> rs4819923 22 15813210 0.02707 0.0159736 0.527491
> rs5994105 22 15813888 0.025202 0.0141296 0.578651
> rs5748760 22 15814084 0.0242894 0.0146486 0.505691
> rs2385786 22 15816846 0.0173057 0.0107816 0.473199
> rs1990483 22 15817310 0.0176641 0.0130525 0.302555
> rs5994110 22 15821524 0.0178411 0.0129001 0.324267
> rs17733785 22 15822154 0.0201797 0.0182093 0.102746
> rs7287116 22 15823131 0.0201993 0.0179028 0.12069
> rs5748765 22 15825502 0.0193195 0.0176513 0.090302
>
> I'm trying to extract the maximum and minimum xpehh (last column)  
> values within a sliding window (non overlapping), of width 10000  
> (calculated relative to pos (third column)). However, as you can  
> tell from the brief excerpt here, although all possible intervals  
> will probably be covered by at least one data point, the number of  
> data points will be variable (incidentally, if anyone knows of a way  
> to obtain this number, that would be lovely), as will the spacing  
> between them. Furthermore, values of chr (second column) will range  
> from 1 to 22, and values of pos will be overlapping across them; I  
> want to evaluate the window separately for each value of chr.
>
> I've looked at the help and FAQ on sliding windows, but I'm a  
> relative newcomer to R and cannot find a way to do what I need to  
> do. Everything I've managed to unearth so far seems geared towards  
> smoother time series. Any help on this problem would be vastly  
> appreciated.
>
> Thanks,
> Irene
>
> -- 
> Irene Gallego Romero
> Leverhulme Centre for Human Evolutionary Studies
> University of Cambridge
> Fitzwilliam St
> Cambridge
> CB2 1QH
> UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list