[BioC] merge SMD data from different print batch

Sun Apr 2 17:14:33 CEST 2006

Adaikalavan Ramasamy wrote:
> 
> 1) First we preprocess each array with LOESS followed by scale
> normalisation before combining with other arrays. This is done is the
> first half of the function my.SMD.expr().

Generally reasonable for an expression experiment in SMD, I think,
although all of the usual concerns about normalization apply.  There are
some very small arrays which might benefit from different treatment, but
at a guess you're probably dealing with human cancer data...?

> 2) Next, we average the log ratios over the LUID or SUID (for old SMD
> dataset) and removing redundant gene annotations. This is done in
> get.SMD.expr(). This is potentially the a contentious issue.

This is generally the way it's done by SMD users.  I think some of the
regulars on this list prefer different approaches.

I'm not certain why you'd use LUID for newer data?

SUID = Sequence Unique ID.  An SUID should (and almost always does) refer
to a single "reporter" DNA sequence (oligo, PCR product, or cDNA clone
spotted on an array).  The same cDNA, on multiple arrays, should always
have the same SUID within SMD, regardless of the fabrication history.
Identification is by public accession (e.g., GenBank for human ESTs)
whenever possible.

LUID = Laboratory Unique ID.  This refers to the product of a distinct
preparation, e.g. a single microtiter plate well for cDNA clones, or a
single synthesis of a particular oligo.  When it's available at all,
LUID should be entirely nested within SUID; i.e., a single oligo in
SMD should have a single SUID, but might have several LUIDs if it was
synthesized multiple times or in different places.  LUID is generally
used to track fabrication process quality issues, such as discovery and
isolation of a contaminated or mislabeled set of spots.  If you're not
concerned about or interested in QA issues, you probably don't want to
work in LUID space.

> 3) Finally we merge the different arrays by using the LUID and the
> average gene expression for that LUID that was calculated in step 2. 

Subject to the note above about the interpretation of LUIDs, this is
standard practice in SMD and probably pretty reasonable.

- Jeremy (former SMD developer)

-- 
Jeremy Gollub
jeremy at gollub.net