[R] Algorythmic Question on Array Filtration

jim holtman jholtman at gmail.com
Sun Jul 15 07:55:35 CEST 2007


This will determine where the overlaps are and delete them.  You can
add some more code to determine which ones you want to delete.

> # add the 5ppm to the dataframe
> x$lower <- x$Mass * (1 - 5e-6)
> x$upper <- x$Mass * (1 + 5e-6)
> # create a matrix for determining overlap by adding 1 at the lower value of a row
> # and substracting 1 at the upper value.
> overlap <- rbind(
+     cbind(index=seq(nrow(x)), value=x$lower, oper=1),
+     cbind(index=seq(nrow(x)), value=x$upper, oper=-1))
> # sort in 'value' order to determine overlap
> overlap[] <- overlap[order(overlap[,'value'], overlap[, 'oper']),]
> # 'qsize should be 0/1 if no overlap
> overlap <- cbind(overlap, qsize=cumsum(overlap[, 'oper']))
> # find the qsize > 1 indicating overlap and use the index of that one and the one
> # after as the ones to delete.  You could add code to determine which one to keep
> o.index <- which(overlap[,'qsize'] > 1)
> # determine the indices to delete
> i.delete <- unique(c(overlap[o.index,'index'], overlap[o.index+1, 'index']))
> # create the new matrix with overlaps deleted
> new.x <- x[-i.delete,]
>
>
>
> head(new.x,10)
       Mass Intensity    lower    upper
1  304.9117 35595.780 304.9102 304.9132
2  305.1726 18760.413 305.1711 305.1741
3  311.0636 24047.307 311.0620 311.0652
4  312.9303 12886.216 312.9287 312.9319
9  316.9118  5908.166 316.9102 316.9134
13 318.0114 37929.855 318.0098 318.0130
14 318.9274 27883.295 318.9258 318.9290
15 318.9889  4496.716 318.9873 318.9905
16 321.2784  3893.165 321.2768 321.2800
17 326.1166 23745.851 326.1150 326.1182


On 7/14/07, Johannes Graumann <johannes_graumann at web.de> wrote:
> John Kane wrote:
> Thanks for your time.
>
> Please find a small example below - the real data is MUCH bigger.
> If you look at rows 5 and 6 of this and calculate the mass precision window
> I have to deal with (5 ppm), you'll find the following:
>
> Row     Lower 5ppm      Mass            Higher 5ppm     Intensity
> 5       312.9419        312.9435        312.9451        20236.181
> 6       312.9422        312.9438        312.9454        14404.502
>
> The precision windows here obviously overlap and I need to get rid of one of
> them, which in this case should be row6, since it has the lower intensity
> associated with it.
>
> For now I resort to doing an intensity sort and descending into the list
> populate a fresh data.frame with entries that do not have any overlap,
> skipping those that do. If somebody has any more sound ideas, I'd
> appreciate to hear about them.
>
> Thanks, Joh
>
> Mass    Intensity
> 304.9117 35595.780
> 305.1726 18760.413
> 311.0636 24047.307
> 312.9303 12886.216
> 312.9435 20236.181
> 312.9438 14404.502
> 313.1763 61033.830
> 313.1766 50788.418
> 316.9118 5908.166
> 317.2805 14084.841
> 317.2833 25603.689
> 317.2837 22866.578
> 318.0114 37929.855
> 318.9274 27883.295
> 318.9889 4496.716
> 321.2784 3893.165
> 326.1166 23745.851
> 327.2894 5318.226
> 328.8852 60934.030
> 329.1517 31985.486
> 331.0426 14883.231
> 332.0268 55126.078
> 332.2798 47364.519
> 333.2813 11423.807
> 337.1990 5330.360
> 339.2144 38450.804
> 339.2867 4065.709
> 340.9561 54101.844
> 340.9770 28172.160
> 345.0583 17945.025
> 345.0583 17877.900
> 347.1742 7359.428
> 347.2407 204792.999
> 353.2302 87864.153
> 353.2302 129691.696
> 363.0161 20453.771
> 363.0943 19481.234
> 363.2142 9238.244
> 363.2315 23323.527
> 363.2533 20039.607
> 363.2534 22068.718
> 364.8918 16857.488
> 364.9368 9527.642
> 366.9029 18174.233
> 373.2197 7730.009
> 385.1147 27907.070
> 385.1148 19383.655
> 393.2913 11860.719
> 396.9074 10793.823
> 400.8792 10750.249
> 402.8729 12411.966
> 407.2771 11270.566
> 442.8689 18101.972
> 442.8697 10671.199
> 447.3470 35927.046
> 449.2347 6959.247
> 456.9339 50402.820
> 461.1670 8636.998
> 461.1670 8151.706
> 473.2985 13782.291
> 490.9224 18510.760
>
> > I think we need a bit more information and perhaps a
> > small example data set to see what you want.
> >
> > I am not familiar with term mass window. Is this a
> > confidence interval around the mass value?
> >
> >
> > --- Johannes Graumann <johannes_graumann at web.de>
> > wrote:
> >
> >> Dear All,
> >>
> >> I have a data frame with the columns "Mass" and
> >> "Intensity" (this is mass
> >> spectrometry stuff). Each of the mass values gives
> >> rise to a mass window of
> >> 5 ppm around the individual mass (from mass -
> >> mass/1E6*5 to mass +
> >> mass/1E5*5). I need to filter the array such that in
> >> case these mass
> >> windows overlap I retain the mass/intensity pair
> >> with the highest
> >> intensity.
> >> I apologize for this question, but I have no formal
> >> IT education and would
> >> value any nudges toward favorable algorithmic
> >> solutions highly.
> >>
> >> Thanks for any help,
> >>
> >> Joh
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained,
> >> reproducible code.
> >>
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list