[BioC] metadata for Affymetrix Poplar array

Nianhua Li nli at fhcrc.org
Fri Feb 23 19:13:02 CET 2007


> Hi, Dick,
> 
> Here are some additional infomation:
> 
> You can extract probeset-to-EntrezGene mapping from affymetrix's
> annotation file, give it as "otherSrc" and feed to ABPkgBuilder:
> 
>> ABPkgBuilder(baseName="affy_poplar_GeneBank_for_AnnBuilder.txt",
>>                  baseMapType="gbNRef",
>>                  pkgName="poplar",
>>                  pkgPath=".",
>>                  organism="Populus trichocarpa",
>>                  version="1.12.0",
>> 		   otherSrc=c(
>>                    EG= "affy_poplar_Entrez_for_AnnBuilder.txt"),
>>                  author=list(
>>                    authors="Dick Beyer",
>>                    maintainer="Dick Beyer..."
>>                    )
>>                  )
> 
> AnnBuilder will use GenBank mapping as the primary source to find
> Entrez Gene mappings for the probesets. If any probeset doesn't have
> mappings, AnnBuilder will use the file given as "otherSrc" as a
> supplement. So you can get better annotation coverage.
> 
>> I am not sure if this whole approach will ultimately be correct as the
>> Affy poplar array has 13 different Populus species on it, with Populus
>> trichocarpa only one of them.
> 
> This won't be a big problem in your case. AnnBuilder extracts
> annotations from Entrez Gene by using Entrez Gene IDs, not taxonomy
> IDs. The organism argument will only affect the following annotations:
> pathway from KEGG, PROSITE and PFAM cross-reference from IPI, and
> chromosome location from UCSC Genome. Neither IPI or UCSC support any
> Populus species. KEGG supports  Populus tremula (aspen) (EST) (eptp)
> and Populus balsamifera (poplar) (EST) (epba), but only have
> gene-pathway mappings for epba. The mapping is for ESTs, not for gene,
> so may not match any Entrez Gene IDs at all. If you want to use this
> mapping, give "Populus balsamifera (poplar) (EST)" as organism. I am
> not sure whether you need the whole string or just the Latin name
> part. But then it will conflict with UniGene, because UniGene only
> supports Populus_trichocarpa and
> Populus_tremula_x_Populus_tremuloides.  UniGene is less important. It
> is only used as a supplemental source for probeset to Entrez Gene
> mapping. If you give probeset-to-EntrezGene mapping as the baseName
> and set baseMapType as ll, you can bypass UniGene.
> 
> To summary, two options:
> 1. Use the above script to invoke AnnBuilder and add
> "Populus_trichocarpa=Pth" to function "UGSciNames" in file "getSrcUrl.R".
> 
> 2. Change organism to "Populus balsamifera" and use
> "probeset-to-EntrezGene" mapping as baseName and "ll" as baseMapType
> 
> The bottom line is that you can get gene name, gene symbol,
> chromosome, cytogenetic band, pubmed, unigene, refseq, and entrez gene
> for your probesets.
> 
> let me know if you need any help and good luck
> 
> nianhua
> 
>



More information about the Bioconductor mailing list