[BioC] deleting probes in expression set

Martin Morgan mtmorgan at fhcrc.org
Wed Sep 12 18:34:50 CEST 2007


Hi Vanessa -- Sounds like you want to 1) subset using character,
numeric, or logical vectors to select or reorder; 2) have some way to
access features as 'groups', e.g., because of duplicate probe set
names. I'd encourage you to think carefully about part 2, as
ExpressionSets are designed the way they are (unique featureNames)
because this is what makes most biological and statistical sense for
the type of data they are designed to represent.

Some details:

I think your second question is easier

> In another stage, when combining from different platforms with
> different genes, I would like to extract just the information for a
> specific gene probe list. Is this possible?

do you want

> library(Biobase)
> data(sample.ExpressionSet)
> sample.ExpressionSet
ExpressionSet (storageMode: lockedEnvironment)
assayData: 500 features, 26 samples 
  element names: exprs, se.exprs 
phenoData
  sampleNames: A, B, ..., Z  (26 total)
  varLabels and varMetadata description:
    sex: Female/Male
    type: Case/Control
    score: Testing Score
featureData
  featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at  (500 total)
  fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: hgu95av2 
> sample.ExpressionSet[c("AFFX-MurIL2_at", "31739_at"),]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 2 features, 26 samples 
  element names: exprs, se.exprs 
phenoData
  sampleNames: A, B, ..., Z  (26 total)
  varLabels and varMetadata description:
    sex: Female/Male
    type: Case/Control
    score: Testing Score
featureData
  featureNames: AFFX-MurIL2_at, 31739_at
  fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: hgu95av2 
 
i.e., provide the vector of featureName as the first argument to
subset?

The first sounds more complicated, the following might get you
going, but proceed with some thought!

> I have created eSets from RGLists for cDNA microarrays. I would like
> to combine in the end data from several different platforms. As a
> special case, I would like to combine 2 eSets with the same gene
> probes, but in a different order on the array (so 2 different array
> platforms).

'combine' *might* help (see ?combine and class?eSet or ?"eSet-class")

You could subset one of the sets using indicies (i.e., featureNames)
of the other (this will reorder expression values to match the order
in the subset), and then manipulate.

> The IDs of my probes are not unique, so I cannot use them as
> FeatureNames...some have a duplicate in there (extension #2 after
> its name) and the control probes are not uniquely named
> e.g. luciferase (10 x).  Is there a way to delete the duplicates or
> integrate their information in the original (taking the average)?

I think first you want to clarify what you're doing here, and whether
it has statistical & biological meaning.

You can leave featureNames unspecificed, and they will then be
provided for you. You might then add a column to featureData to keep
track of which probes map to which (non-unique) identifiers (though
how are you going to interpret multiple expresion values for the same
identiifer?). Subsetting by these features then becomes more awkward,
e.g.,

> obj <- sample.ExpressionSet
> featureData(obj)[["my_ids"]] <- paste("id", seq(1, nrow(obj)))
> qids=c("id 10", "id 100")
> idx <- featureData(obj)[["my_ids"]] %in% qids
> obj[idx,]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 2 features, 26 samples 
  element names: exprs, se.exprs 
phenoData
  sampleNames: A, B, ..., Z  (26 total)
  varLabels and varMetadata description:
    sex: Female/Male
    type: Case/Control
    score: Testing Score
featureData
  featureNames: AFFX-BioDn-5_at, 31339_at
  fvarLabels and fvarMetadata description:
    my_ids: NA
experimentData: use 'experimentData(object)'
Annotation: hgu95av2

These types of operations would allow you to average or do other
operations on feature names.

> How to delete the control probes?  This would enable me to end up
> with unique IDs, so I could use them as feature names and then it is
> fairly easy to combine the two expression sets.

This is subsetting again, probably most easily done using a logical
index along the lines of

> not_ctrls <- !(featureData(obj)[["my_ids"]] %in% ctrl_ids)
> obj[not_ctrls,]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 498 features, 26 samples 
  element names: exprs, se.exprs 
phenoData
  sampleNames: A, B, ..., Z  (26 total)
  varLabels and varMetadata description:
    sex: Female/Male
    type: Case/Control
    score: Testing Score
featureData
  featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at  (498 total)
  fvarLabels and fvarMetadata description:
    my_ids: NA
experimentData: use 'experimentData(object)'
Annotation: hgu95av2 

You can use similar ideas with other R objects, including the RGList
of limma, and with basic structures like a matrix or data frame.

Hope that helps,

Martin

Vanessa Vermeirssen <vanessa.vermeirssen at psb.ugent.be> writes:

> Hi,
>
> How to delete the control probes? 
> This would enable me to end
> up with unique IDs, so I could use them as feature names and then it is 
> fairly easy to combine the two expression sets.
>
> In another stage, when combining from different platforms with different 
> genes, I would like to extract just the information for a specific gene 
> probe list. Is this possible?
>
> I am new to Bioconductor, but learning a lot every day... I hope that 
> somebody can help me.
>
> Thanks so much already,
> Vanessa Vermeirssen
>
> -- 
> ==================================================================
> Vanessa Vermeirssen, PhD
>
> Tel:+32 (0)9 331 38 23                        fax:+32 (0)9 3313809
> VIB Department of Plant Systems Biology, Ghent University
> Technologiepark 927, 9052 Gent, BELGIUM
> vamei at psb.ugent.be                         http://www.psb.ugent.be
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the Bioconductor mailing list