[BioC] spot filtering

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Mon Jul 25 10:24:58 CEST 2005

Yes - a colleague of mine filtered out all bad spots prior to normalisation (loess) and found that after a lot of hard work the end result was very, very small differences.  Sure, if you have a huge amount od bad spots, leaving them in may be a bad idea, but if you have *that* many bad spots, perhaps analysing the data is not such a good idea anyway... 

-----Original Message-----
From:	Brooke-Powell, Elizabeth [mailto:etbp2 at borcim.wustl.edu]
Sent:	Fri 22/07/2005 7:04 PM
To:	michael watson (IAH-C)
Cc:	bioconductor at stat.math.ethz.ch
Subject:	RE: spot filtering

Thank you for replying. That is very interesting I am not a statistician,
but when I told some people I used a similar approach of leaving all data in
and filtering later people heavily criticized it (mainly biologists). They
said that if you put junk into the system you'll get junk out.. In my
opinion this would be more important if you have a lot of bad spots, but how
many is too many? Have you looked at the effect of leaving the "bad" data in
particularly the data and make up of the lists you get out?  

-----Original Message-----
From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] 
Sent: Friday, July 22, 2005 12:01 PM
To: Brooke-Powell, Elizabeth; bioconductor at stat.math.ethz.ch
Subject: RE: spot filtering

Actually, I don't use any of the bioconductor functions for reading in flags
or weighting values depending on quality of the spot etc.

Generally, what I do is create a table of flags - with spots as the rows and
array as the column.  These flags are sometimes genepix flags, sometimes
composite flags I made up.

Then I do all of my analysis in limma, using all data, I don't weight
anything, and I don't convert anything into NAs.

At the end, I output the data from topTable() into a text file, load it into
MySQL or MS Access, link it to the flags data and decide which, out of my
list from topTable, I believe according to the flags.

Note you *could* do this linking in R using the merge() function too.

-----Original Message-----
From:	Brooke-Powell, Elizabeth [mailto:etbp2 at borcim.wustl.edu]
Sent:	Fri 22/07/2005 5:41 PM
To:	bioconductor at stat.math.ethz.ch
Cc:	michael watson (IAH-C)
Subject:	spot filtering


I was interested in how you flag your data, when you load your files do you
read in your flag column as part of a standard GenePix type output file, so
limma uses it when the linear model is fit? I use BlueFuse and its flag
column is quite different from GenePix and the like and at present not able
to be used in limma. I am wondering how to mark (flag) the bad data and
either leave it in or what to put in the data file to get the data ignored
i.e. can you put NA in place of the data point and have it ignored? Is it as
simple as creating a new flag column converting the BlueFuse flags into
GenePix like flags? If I load the data file using the other file type option
in LimmaGUI it doesn't allow me to tell it where there is a flag column. Is
this something that could be fixed assuming the flag column conforms to the
GenePix style of 0, +1 and -1 calls?

Thanks for the help and insight,


More information about the Bioconductor mailing list