[BioC] Help with HyperGTest

Marc Carlson mcarlson at fhcrc.org
Thu Nov 12 22:21:55 CET 2009


Hi Yue,

I am afraid in that case we don't have an annotation package for you (we
only support the most popular strains).   But if you want to use GOstats
there is STILL a workaround for you if you have access to some raw GO to
gene ID mappings.  This is not as hopeless as it may sound, you can
probably get such data from NCBI or failing that from the blast2GO project.
 
Then you can read the "GOstatsForUnsupportedOrganisms.pdf" vignette in
the documentation for the GOstats package.  You can find this file here:

http://www.bioconductor.org/packages/devel/bioc/html/GOstats.html


Please let me know if you have more questions,


  Marc




James W. MacDonald wrote:
> Hi Yue,
>
> I see the problem now. The Entrez IDs used by this package are for the
> MG1655 substrain rather than the DH10B substrain.
>
> I don't think it would be a trivial exercise to make your own
> annotation package - this package is built off the ecoliK12.db0
> package that Marc Carlson creates, so it isn't as simple as using
> SQLForge to create a new package. Creating the db0 packages is
> non-trivial, so there are no facilities (that I know of) for end users
> to create one.
>
> I don't know the way around this problem, as I am not familiar with
> the different E. coli strains. If there is a one-to-one mapping of
> genes between strains then you might be able to map your Entrez Gene
> IDs to the MG1655 strain and go from there.
>
> Best,
>
> Jim
>
>
>
> Yue, Chen - BMD wrote:
>> Hi Jim,
>>  
>> Thanks for your answer. I tried to use your suggestion but it still
>> gave me the same error. How can I know what entrez IDs are used by
>> "org.Eck12.eg.db"? Is there anyway to make my own annotation package
>> for EcK12 substr DH10B? Thanks a lot!
>>  
>> Regards,
>> Chen, Yue
>>  
>>
>> ------------------------------------------------------------------------
>> *From:* James W. MacDonald [mailto:jmacdon at med.umich.edu]
>> *Sent:* Tue 11/10/2009 1:19 PM
>> *To:* Yue, Chen - BMD
>> *Cc:* Marc Carlson; bioconductor at stat.math.ethz.ch
>> *Subject:* Re: [BioC] Help with HyperGTest
>>
>> Most likely those values are read in as numeric, which won't work. You
>> need to convert to character, or use
>>
>> targetid <- scan("targetids.txt","c")
>>
>> to read in.
>>
>> Best,
>>
>> Jim
>>
>> 6058204
>>
>> Yue, Chen - BMD wrote:
>>  > Hi Marc and Jim,
>>  >  > I'm sorry about the stripped attachment. I listed some targetid
>> and
>>  > ecoliid I used. Can you take a look? Thanks!
>>  >  > Regards,
>>  >  > Chen, Yue
>>  >  > <<targetids.txt>>
>>  > 6058204
>>  > 6058276
>>  > 6058499
>>  > 6058576
>>  > 6058687
>>  > 6058820
>>  > 6058853
>>  > 6058937
>>  > 6058989
>>  > 6059024
>>  > 6059029
>>  > 6059123
>>  >  > <<ecoliids.txt>>
>>  > 6061999
>>  > 6061998
>>  > 6061997
>>  > 6061996
>>  > 6061995
>>  > 6061994
>>  > 6061993
>>  > 6061992
>>  > 6061991
>>  > 6061990
>>  > 6061989
>>  > 6061988
>>  > 6061987
>>  > 6061986
>>  > 6061985
>>  > 6061984
>>  > 6061983
>>  > 6061982
>>  > 6061981
>>  > 6061980
>>  > 6061979
>>  > 6061978
>>  > 6061977
>>  > 6061976
>>  > 6061975
>>  > 6061974
>>  > 6061973
>>  > 6061972
>>  > 6061971
>>  > 6061970
>>  > 6061969
>>  > 6061968
>>  > 6061967
>>  >  >
>>  >
>> ------------------------------------------------------------------------
>>  > *From:* Marc Carlson [mailto:mcarlson at fhcrc.org]
>>  > *Sent:* Tue 11/10/2009 11:39 AM
>>  > *To:* Yue, Chen - BMD
>>  > *Cc:* bioconductor at stat.math.ethz.ch
>>  > *Subject:* Re: [BioC] Help with HyperGTest
>>  >
>>  > Hi Yue,
>>  >
>>  > It's a good idea to always give us the output of sessionInfo()
>> when you
>>  > post, but I can tell you that with this error, the problem is usually
>>  > caused by the input IDs.  If you are using the org.EcK12.eg.db
>> then  you
>>  > must use IDs that are Entrez Gene IDs.  What is the output from
>>  > head(targetids) and head(ecoliids)?
>>  >
>>  >   Marc
>>  >
>>  >
>>  >
>>  >
>>  > Yue, Chen - BMD wrote:
>>  >  > Dear All,
>>  >  >
>>  >  > I hope to get some help on the hyperGTest in GOstats. I want to
>> do an
>>  > GO enrichment anlaysis on a set of E. coli K12 genes (substr
>> DH10B). I
>>  > attached the target id file, partial ecoli id file (as
>> universeGeneIds)
>>  > and sessionInfo to the email. The following is my commands and
>> error. It
>>  > seems that my gene id is not found in the annotation package but I
>> don't
>>  > know how to find out what gene ids are included in the package. I
>> used
>>  > "org.EcK12.eg.db" package which uses Entrez ids and my R version is
>>  > 2.9.2 on WinXP. Should I use a different annotation package? Thank
>> you
>>  > very much!
>>  >  >
>>  >  >  >  >> targettable <- read.table("D:/RProjects/targetids.txt")
>>  >  >> ecolitable <- read.table("D:/RProjects/ecoliids.txt")
>>  >  >> targetids <- unique(targettable[,1])
>>  >  >> ecoliids <- unique(ecolitable[,1])
>>  >  >> params = new("GOHyperGParams", geneIds=targetids,
>>  > universeGeneIds=ecoliids, annotation="org.EcK12.eg.db",
>> ontology="BP",
>>  > pvalueCutoff=0.01, conditional=FALSE, testDirection="over")
>>  >  >> BPoverTest = hyperGTest(params)
>>  >  >>    >  > Error in getUniverseHelper(probes, datPkg, entrezIds) :
>>  >  >   After filtering, there are no valid IDs that can be used as the
>>  > Gene universe.
>>  >  >   Check input values to confirm they are the same type as the
>> central
>>  > ID used by your annotation package.
>>  >  >   For chip packages, this will still mean the central GENE
>> identifier
>>  > used by the package (NOT the probe IDs).
>>  >  >
>>  >  > Regards,
>>  >  >
>>  >  > Yue
>>  >  >
>>  >  >
>>  >  >
>>  >  > This email is intended only for the use of the individual or
>> entity
>>  > to which it is addressed and may contain information that is
>> privileged
>>  > and confidential. If the reader of this email message is not the
>>  > intended recipient, you are hereby notified that any dissemination,
>>  > distribution, or copying of this communication is prohibited. If you
>>  > have received this email in error, please notify the sender and
>>  > destroy/delete all copies of the transmittal. Thank you.
>>  >  >
>>  >  >  >  >
>> ------------------------------------------------------------------------
>>  >  >
>>  >  > _______________________________________________
>>  >  > Bioconductor mailing list
>>  >  > Bioconductor at stat.math.ethz.ch
>>  >  > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>  >  > Search the archives:
>>  > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>  >
>>  >  >
>>  >
>>  > This email is intended only for the use of the individual or
>> entity to
>>  > which it is addressed and may contain information that is
>> privileged and
>>  > confidential. If the reader of this email message is not the intended
>>  > recipient, you are hereby notified that any dissemination,
>> distribution,
>>  > or copying of this communication is prohibited. If you have received
>>  > this email in error, please notify the sender and destroy/delete all
>>  > copies of the transmittal. Thank you.
>>
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>>
>>  
>>
>>
>> This email is intended only for the use of the individ...{{dropped:19}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list