[BioC] RMAPPER and whole genome TFBS information

Vincent Carey stvjc at channing.harvard.edu
Sun Apr 17 05:29:18 CEST 2011


I am listed as the author of this package, and indeed some years ago I
wrote the R code that interfaces to the XML-RPC of MAPPER database.  I
don't know exactly why you are seeing the error that you are seeing,
and as far as I can tell your inputs meet the requirement of the
rmapperHelp() server-generated documentation.

I registered to use the database manually and created a query that it
processed as

Gene: 	Trp53rk (transformation related protein 53 regulating kinase)
Gene ID: 	76367 	mRNA accession: 	NM_023815
Organism: 	Mus musculus
Scanned region: 	chr2:166617267-166626993 (click to download)
Models: 	JASPAR matrices, TRANSFAC matrices, M00789

This yielded over 2400 hits, for example:

Gene 	GeneID 	Transcript 	Factor 	Name(s) 	Strand 	Chrom 	Start 	End
	Region 	Score 	E-value
Trp53rk 	76367 	NM_023815 	M00791 	HNF3 	+ 	chr2 	166,617,268
	166,617,279 	Promoter 	4.6 	14

Trp53rk 	76367 	NM_023815 	MA0041 	Foxd3 	- 	chr2 	166,617,269
	166,617,279 	Promoter 	2.9 	11

Trp53rk 	76367 	NM_023815 	MA0047 	Foxa2 	- 	chr2 	166,617,269
	166,617,280 	Promoter 	3.9 	4.3

with further details on first hit

Trp53rk 	76367 	NM_023815 	M00791 	HNF3 	+ 	chr2 	166,617,268
	166,617,279 	Promoter 	4.6 	14
Gene: 	Trp53rk 	Factor: 	HNF3 	Position (abs): 	chr2:166,617,268-166,617,279
Gene ID: 	76367 	Model: 	M00791 	Position (tx): 	-1999 to -1988
mRNA: 	NM_023815 	Alignment: 	

*->taaacaaAca.a<-*
   t+ acaaA+a +
   TGTACAAATAtT

	Position (cds): 	-2045 to -2034
ENSEMBL: 	ENSMUSG00000042 	Score: 	4.6 	E-value: 	14
Gene region: 	Promoter 	Strand: 	+ 	Conserved: 	-

in principle RMAPPER will return all such information.  However when I
try to pass the related query information to readMapper function, I
get a success code but just a header back -- no hit data is returned.
Specifically

> readMAPPER(gene="Trp53rk", models="M00789",org = "Mm", pbases = 2000)
Error in seq.default(1, nh * 4, 4) : wrong sign in 'by' argument

Enter a frame number, or 0 to exit

1: readMAPPER(gene = "Trp53rk", models = "M00789", org = "Mm", pbases = 2000)
2: new("mapperHits", query = sett, hits = reshapeMapper(tmp))
3: initialize(value, ...)
4: initialize(value, ...)
5: reshapeMapper(tmp)
6: df[seq(1, nh * 4, 4), ]
7: `[.data.frame`(df, seq(1, nh * 4, 4), )
8: seq(1, nh * 4, 4)
9: seq.default(1, nh * 4, 4)

So I suggest you contact the maintainers.  I will carbon them on this note.

R version 2.13.0 Patched (2011-04-14 r55443)
Platform: x86_64-apple-darwin10.6.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
 [1] org.Mm.eg.db_2.5.0    RSQLite_0.9-4         DBI_0.2-5
 [4] AnnotationDbi_1.13.21 Biobase_2.11.10       biomaRt_2.8.0
 [7] RMAPPER_1.3.0         weaver_1.17.0         codetools_0.2-8
[10] digest_0.4.2

loaded via a namespace (and not attached):
[1] RCurl_1.5-0 XML_3.2-0



On Sat, Apr 16, 2011 at 10:41 PM, Ravi Karra <ravi.karra at gmail.com> wrote:
> Hello,
>
> I am trying to identify all putative GATA binding sites in the mouse genome.  Ideally, I want to get genomic coordinates for each "binding site" to enter into a GenomicRanges object (I know there will be a lot of hits) and to overlay this information with the results of a ChIP-Seq experiment. Seems that there are multiple packages to try and do this with, but only RMAPPER allows an interface with the TRANSFAC and Jaspar TF binding site models.
> I have been getting multiple errors that I am not sure how to resolve.  Is this package the best way to get the information I want?  Is there a better alternative?  Is there an upper limit to the MAPPER query?
>
> Thanks for your help,
> Ravi
>
> #load the necessary libraries
> library (RMAPPER)
> library (biomaRt)
>
> #Compute the mouse genome
> #get identifiers to be input into MAPPER
> mm = useMart (biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
> mmGenes = getBM (attributes = c ("ensembl_gene_id", "external_gene_id", "entrezgene", "external_transcript_id"), mart = mm)
> #get list of all entrez gene id's
> egids = unique (mmGenes$entrezgene); egids = egids [2:length (egids)] #first id is NA
>
> #make a list of all geneids
> eglist = paste (egids [500:550], collapse = ",")
>
> #get the factor models
> gata = "M00789, T02689, T00311, T00306, T00305, T00267, T00305, T00267, T00306, T00311, M00632, M00462, MA0037"
>
> #Run MAPPER with 50 genes
> gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000)
>
>>Error in file(con, "r") : cannot open the connection
> In addition: Warning message:
> In file(con, "r") : cannot open: HTTP status was '0 (null)'
>
> #Run MAPPER with 10 genes
> eglist = paste (egids [500:510], collapse = ",")
> gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000)
>
>> Error in seq.default(1, nh * 4, 4) : wrong sign in 'by' argument
>
>
>> traceback ()
> 10: stop("wrong sign in 'by' argument")
> 9: seq.default(1, nh * 4, 4)
> 8: seq(1, nh * 4, 4)
> 7: `[.data.frame`(df, seq(1, nh * 4, 4), )
> 6: df[seq(1, nh * 4, 4), ]
> 5: reshapeMapper(tmp)
> 4: initialize(value, ...)
> 3: initialize(value, ...)
> 2: new("mapperHits", query = sett, hits = reshapeMapper(tmp))
> 1: readMAPPER(gene = eglist, models = gata, org = "Mm", pbases = 5000)
>
>> sessionInfo ()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] biomaRt_2.8.0 RMAPPER_1.2.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.5-0  tools_2.13.0 XML_3.2-0
>
>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list