[BioC] Problem with removing duplicated probes of datasets without annotation
whuber at embl.de
Sun Jun 22 10:49:31 CEST 2014
this is an R question, code like the following would do the job
x = … a data.frame with columns ‘probeid’ and ‘pvalue’ ...
s = split( seq_len(nrow(x)), x$probeid)
uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] )
And you can replace what’s inside the ‘which.min(…)’ expression with whatever pleases you.
There are plenty of places in vignettes etc. where this type of operation is done. One I happen to be aware of right now is inside the function ‘myHeatmap’ of the ‘Hiiragi2013’ package.
On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] <guest at bioconductor.org> wrote:
> Dear R helpers,
> I'm working with the goat dataset with no available annotation db. For this reason, I use the 'genefilter' instead of 'nsFilter' function with ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I have the filtered data with 500 ducplicated probes of which I want to remove.
> Due to my limited ability, I cannot figure out how to do them. It would be great if I can either select a probe of each duplicates with lowest p-value or most variance.
> Would you please help me with some examples?
> Best Regards,
> -- output of sessionInfo():
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>  LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
>  LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
>  LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
>  LC_PAPER=en_GB.UTF-8 LC_NAME=C
>  LC_ADDRESS=C LC_TELEPHONE=C
>  LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
> attached base packages:
>  parallel stats graphics grDevices utils datasets methods
>  base
> other attached packages:
>  Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1
> loaded via a namespace (and not attached):
>  annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7
>  GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4
>  splines_3.1.0 stats4_3.1.0 survival_2.37-7
>  tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1
>  xtable_1.7-3
> Sent via the guest posting facility at bioconductor.org.
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor