[BioC] annotation files for agilent: a bit off-topic

Weiwei Shi helprhelp at gmail.com
Wed Jul 18 01:04:59 CEST 2007


HI, Francois:
first, thanks for the detailed reply.

The matching is done and only ~7700 probes out of ~10,100 are matched
( and I assume they start with A_)

However, some probeID are like
> tail(x0, 10)
 [1] "A_24_P913609" "Hs345093.1"   "A_23_P144999" "A_23_P399001"
 [5] "A_23_P340617" "A_32_P104088" "A_32_P34372"  "A_23_P62764"
 [9] "Hs132898.3"   "A_32_P370539"

since it is a customized array, I think they might use UnigeneID(?),
but what's ".3"? Should it be Hs.132898? confused!

FeatureExtractor_DesignFileName gives
D:\Array_Data\Kinder-Onko\Design Files
KinderOnko\Custom_Final_280904\012714_d_20040819.xml

Is that right?

Be honest, I hate people providing data w/o good annotation :(

Kinda asking us to play the guessing game.

Best,

Weiwei



On 7/17/07, Francois Pepin <fpepin at cs.mcgill.ca> wrote:
> Hi Weiwei,
>
> I'd assume the last one:
>
> 1st is a very old chip
> 2nd & 3rd are for CGH, not expression
> 4th is their basic human gene expression.
>
> Keep in mind that they now have 4x44 arrays that use the same non-
> control probes but has them in different positions. If the chips were
> purchased recently, they are likely the 4x44 ones, as they end up being
> a lot cheaper.
>
> The quick and dirty way of finding out: look in the feature extraction
> file, you'll see a column that says FeatureExtractor_DesignFileName in
> the header. With this should be a file that looks like
> 014868_D_F_20060807.xml. The first part (014868) says the chip type
> (design ID, actually) while the 2nd gives the annotation release date.
> Then go to http://www.chem.agilent.com/cag/bsp/array_list.asp and search
> in the list. In this case, you'd see this is the 4x44 whole genome mouse
> chip.
>
> There is a bioconductor package for the human whole genome chips:
> hgug4112a. This does not include any non-control probes, so it should
> work with both the 1x44 and 4x44.
>
> Also, the read.maimages should also grab the gene annotation that is
> included in the feature extractor software. They might be out of date,
> but it should help you to keep going.
>
> Hope this helps,
>
> Francois
>
> On Tue, 2007-07-17 at 17:43 -0400, Weiwei Shi wrote:
> > I am doing the latter now b/c I don't know the answer to the first
> > question. The data provider is sloooooowwww in reply.
> >
> > On 7/17/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> > > Weiwei Shi wrote:
> > > > Hi, there:
> > > >
> > > > I knew this is a bit off-topic but hope someone has knowledge to share:
> > > >
> > > > I found 4 zipped files about annotation from agilent:
> > > >
> > > > Human 1A(v2)
> > > > Human Genome CGH 44A
> > > > Human Genome CGH 44B
> > > > Human Genome, Whole
> > > >
> > > > I assume I can use the last one for my arrays but w/o knowing the
> > > > difference b/w them, I am not quite sure.
> > >
> > > You will need to find out what platform your arrays use or do some probe
> > > ID matching between your arrays and the annotation packages.  The former
> > > is preferred.
> > >
> > > Sean
> > >
> >
> >
>
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the Bioconductor mailing list