[BioC] Filtering pmSequence based on probe target level for HTA 2.0 arrays

Wed Aug 20 18:48:26 CEST 2014

Hi Steve,

It looks like pmSequence() for HTAFeatureSet objects dispatches on the
FeatureSet class:

> showMethods(pmSequence, class="FeatureSet", includeDefs = TRUE)
Function: pmSequence (package oligo)
object="FeatureSet"
function (object, ...)
{
    .local <- function (object)
    {
        pmSequence(getPlatformDesign(object))
    }
    .local(object, ...)
}

which doesn't allow for a target argument. I haven't looked closer to see
why the dispatch is off. But it appears it should use stArrayDBPDInfo class:

> showMethods(pmSequence)
Function: pmSequence (package oligo)
object="AffyGenePDInfo"
object="AffyHTAPDInfo"
    (inherited from: object="stArrayDBPDInfo")
object="AffySNPPDInfo"
object="DBPDInfo"
object="ExonFeatureSet"
object="FeatureSet"
object="GeneFeatureSet"
object="HTAFeatureSet"
    (inherited from: object="FeatureSet")
object="stArrayDBPDInfo"

Which we can force by doing something like

z <- pmSequence(getPD(dat), target="probeset")

where 'dat' is a HTAFeatureSet. But we still get more probe sequences than
I would expect:

> pmid1 <- pmindex(dat, target="core")
> pmid2 <- pmindex(dat, target="probeset")
> length(pmid1)
[1] 6058440
> length(pmid2)
[1] 7576209

But since both pmid1 and pmid2 are ordered, I think you should be able to
get the pmSequences for just the probes that will be summarized at the
'core' level by subsetting:

> z.core <- z[pmid2 %in% pmid1,]
> z.core
  A DNAStringSet instance of length 6056075
          width seq
      [1]    25 GATTAATCTTAAATCAGGATGATCC
      [2]    25 CAAAATCTAAACCCGGACTGTACCT
      [3]    25 CACACTATTCACACCCGCACCGAAG
      [4]    25 CCGTACCTTTCAAGGTCGGCCAAGC
      [5]    25 ACCCCTTGACTAAGGACGGTTGTTG
      ...   ... ...
[6056071]    25 TCACCGTGTGTCGACGCCGGACACA
[6056072]    25 AGGTTCCTGGGACCTCGTGAGTACA
[6056073]    25 GACCCAGAGTGTAGCTCGACGACCT
[6056074]    25 ACCACAGGTACGACACTACTAAGGA
[6056075]    25 TGGCCTTCCGTGCATATCTGCACCT

Best,

Jim

On Wed, Aug 20, 2014 at 10:55 AM, Steve Piccolo <
stephen.piccolo at hsc.utah.edu> wrote:

> List members,
>
> I am working with some Affymetrix HTA 2.0 arrays. I have installed the
> draft annotation package described here:
> http://grokbase.com/t/r/bioconductor/1428394w2d/bioc-draft-support-for-hta-
> 2-0-with-oligo
>
> I am using the following commands from the oligo package to extract
> intensity values and PM sequences via the oligo package. However, I am
> running into a problem because the oligo::pmSequence function doesn't
> allow me to specify a target probe type for these arrays. By default
> oligo::pm() uses the "core" probes, whereas oligo::pmSequence only allows
> me to use the "probeset" probes. In contrast, for the ST arrays, I am able
> to do this.
>
> affyExpressionFS <- read.celfiles(celFilePath)
> pint = oligo::pm(affyExpressionFS, target="core")
>
> pmSeq = oligo::pmSequence(affyExpressionFS, target="core")
>
>
>
> Below is the error message I get.
>
> Loading required package: pd.hta.2.0
> Loading required package: RSQLite
> Loading required package: DBI
> Platform design info loaded.
> Reading in : testInputData/HTA2.CEL.gz
> Error in { : task 1 failed - "unused argument (target = "probeset")"
>
> Below is my session info. Any help would be appreciated.
>
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  methods   stats     graphics  grDevices utils     datasets
> [8] base
>
> other attached packages:
>  [1] pd.hta.2.0_3.8.0    RSQLite_0.11.4      DBI_0.2-7
>  [4] GEOquery_2.30.1     sva_3.10.0          mgcv_1.8-2
>  [7] nlme_3.1-117        corpcor_1.6.6       foreach_1.4.2
> [10] oligo_1.28.2        Biostrings_2.32.1   XVector_0.4.0
> [13] IRanges_1.22.10     Biobase_2.24.0      oligoClasses_1.26.0
> [16] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
>  [1] affxparser_1.36.0     affyio_1.32.0         BiocInstaller_1.14.2
>  [4] bit_1.1-12            codetools_0.2-8       compiler_3.1.0
>  [7] ff_2.2-13             GenomeInfoDb_1.0.2    GenomicRanges_1.16.4
> [10] grid_3.1.0            iterators_1.0.7       lattice_0.20-29
> [13] Matrix_1.1-4          preprocessCore_1.26.1 RCurl_1.95-4.3
> [16] splines_3.1.0         stats4_3.1.0          XML_3.98-1.1
> [19] zlibbioc_1.10.0
>
>
>
>
> Regards,
> -Steve
>
> -‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
> Stephen Piccolo, Ph.D.
> Postdoctoral Research Associate
>
> Affiliations:
>   Department of Pharmacology and Toxicology, University of Utah
>   Division of Computational Biomedicine, Boston University School of
> Medicine
> ‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]