[BioC] Ringo/Starr getProfiles function

Mon Dec 14 12:25:10 CET 2009

Dear Noah,

the writeGFF function does not create a gff annotation, but writes all coloumns of the assayData (intensities in the ExpressionSet) to a gff file. 
If you do this, you can view the data in one of the established genome Browsers (which take this format).
I modified your script (my changes are commented), such that the transciprAnno object is compatible with the probeAnno and the ExpressionSet from the vignette. It should work
now with your data. The first part below shows the code from the vignette for normalization:

library(Starr)
dataPath <- system.file("extdata", package="Starr")
bpmapChr1 <- readBpmap(file.path(dataPath, "Scerevisiae_tlg_chr1.bpmap"))
cels <- c(file.path(dataPath,"Rpb3_IP_chr1.cel"), file.path(dataPath,"wt_IP_chr1.cel"), 
	file.path(dataPath,"Rpb3_IP2_chr1.cel"))
names <- c("rpb3_1", "wt_1","rpb3_2")
type <- c("IP", "CONTROL", "IP")
rpb3Chr1 <- readCelFile(bpmapChr1, cels, names, type, featureData=T, log.it=T)

rpb3_loess <- normalize.Probes(rpb3Chr1, method="loess")

ips <- rpb3Chr1$type == "IP"
controls <- rpb3Chr1$type == "CONTROL"
description <- c("Rpb3vsWT")
rpb3_loess_ratio <- getRatio(rpb3_loess, ips, controls, description, fkt=median, featureData=F)

probeAnnoChr1 <- bpmapToProbeAnno(bpmapChr1)

##############################
### Here is your biomaRt code:
##############################

library(biomaRt)

ensembl = useMart("fungal_mart_3")
ensembl = useDataset("scerevisiae_eg_gene", mart = ensembl)
chrom <- c("I")
transcriptAnno <- getBM(attributes=c("ensembl_gene_id", "chromosome_name",
 "strand", "transcript_start", "transcript_end"),
 filters = "chromosome_name", values = chrom, mart = ensembl)

transcriptAnno[,2][transcriptAnno[,2]=="I"] <- "Sc:Oct_2003;chr1"
names(transcriptAnno) = c("name", "chr", "strand", "start", "end")

###########################
### Rename $chr to "chr1", as in probeAnnoChr1 (you can se names with ls(probeAnnoChr1))
### See what the names are in your probeAnno object.
###########################

transcriptAnno$chr = sapply(strsplit(transcriptAnno$chr, ";"), function(x) {x[2]})

##########################
### For plotting along the TSS, the $start and $end are both set to the TSS position in the gff annotation, 
##########################

watson <- which(transcriptAnno$strand == 1)
crick <- which(transcriptAnno$strand == -1)

transcriptAnno[watson, ]$end = transcriptAnno[watson, ]$start
transcriptAnno[crick, ]$start = transcriptAnno[crick, ]$end

############################
### Getting and plotting the profiles
############################

profile <- getProfiles(rpb3_loess_ratio, probeAnnoChr1, transcriptAnno, 500, 500, feature="TSS", borderNames="TSS", method="basewise")
clust <- rep(1, dim(transcriptAnno)[1])
names(clust) <- transcriptAnno$name
x11()
plotProfiles(profile, cluster=clust, type="l")

## End

I hope you don't get errors now. I think I will include the biomaRt example in the vignette to make the usage of these functions
more clearly. 
Best regards 

Benedikt

Noah Dowell <noahd at ucla.edu> schrieb :

> Dear Wolfgang,
> 
> Thank you for your response.  Sorry about the &quot;sessionInfo()&quot;
> mistake... I think I included it earlier but I will include each time (I
> included it below).  Benedikt Zacher caught my NA error earlier since I had my
> probeAnnot object pointing to one name of chromosomes (Affymetrix nomenclature)
> and the tssAnno pointing to another (ensembl nomenclature).  I am still
> struggling somewhat with other issues if you should chose to read on.  Thank you
> much for your time and assistance.
> 
> Sincerely,
> 
> Noah
> 
> 
> Dear Benedikt,
> 
> Maybe it will come to me sending the R object to you, but I think I am close to
> understanding the workflow a little better.  I think I just missed some of the
> essential early steps necessary to using Starr (sorry I am new to R and trying
> to get up to speed quickly).  I think the following workflow would be helpful to
> include in the vignette (or maybe I am getting this wrong).  Here is what I am
> going to try (I excluded some of the commands to load or read files):
> 
> ### fire up R and create a raw Annotation file using Biomart
> 
> 	library(biomaRt)
> 
>  	ensembl = useMart(&quot;fungal_mart_3&quot;)
> 
>  	ensembl = useDataset(&quot;scerevisiae_eg_gene&quot;, mart = ensembl)
> 
>  	chrom &lt;- c(&quot;I&quot;)
> 			
> 	 transcriptAnno &lt;- getBM(attributes=c(&quot;ensembl_gene_id&quot;,
> &quot;chromosome_name&quot;, 
> 						+  &quot;strand&quot;, &quot;transcript_start&quot;,
> &quot;transcript_end&quot;), 
> 						+ filters = &quot;chromosome_name&quot;, values = chrom, mart =
> ensembl)
> 	
>          transAnnoChr = transcriptAnno
> 	
> 	transAnnoChr[,2][transAnnoChr[,2]==&quot;I&quot;] &lt;-
> &quot;Sc:Oct_2003;chr1&quot;
> 
> ###change the column names to match your file in the Starr vignette and then
> put them in the right order:
> 
> 	names(transAnnoChr) = c(&quot;name&quot;, &quot;chr&quot;, &quot;strand&quot;,
> &quot;start&quot;, &quot;end&quot;)
> 
> 				chrOnly   = transAnnoChr[,2]
> 				startOnly = transAnnoChr[,4]
> 				endOnly   = transAnnoChr[,5]
> 				strandOnly= transAnnoChr[,3]
> 				nameOnly  = transAnnoChr[,1]
> 				
> 	tssAnno = cbind(chrOnly,startOnly, endOnly, strandOnly, nameOnly)
> 
> ### make a minimal Expression set so that the writeGFF function works
> 
> 	library(Biobase)
> 
> 	exprs = as.matrix(tssAnno, header = TRUE, sep = &quot;\t&quot;, row.names = 1,
> as.is = TRUE)
> 
> 	minimalSet = new(&quot;ExpressionSet&quot;, exprs = exprs)
> 
> ### Now we can use the Starr package to write the gffAnno file
> 
> 	library(Starr)
> 
> 	bpmap = readBpmap(&quot;Sc03b_MR_v04.bpmap&quot;)
> 
> 	probeAnno = bpmapToProbeAnno(bpmap)
> 
> 	writeGFF(minimalSet, probeAnno, &quot;tssAnno.gff&quot;)
> 
> 
> 	transcriptAnno = read.gffAnno(&quot;tssAnno.gff&quot;, feature = ???)
> 
> ### I am not sure what to use as a feature for the read.gffAnno function.  
> ### I think this should get me ready to use the getProfiles function, your
> thoughts??
> 
> 
> Thank you so much for your help and for creating this package.  I have not run
> through these commands but wanted to see if the workflow of
> 1. Annotation file creation (biomaRt)
> 2. ExpressionSet Creation (BioBase)
> 3. gffAnno Creation (Starr)
> 
> Is this the way to go?
> 
> Best,
> 
> Noah
> 
> 
> &gt;&gt; sessionInfo()
> &gt; R version 2.10.0 (2009-10-26) 
> &gt; i386-apple-darwin9.8.0 
> &gt; 
> &gt; locale:
> &gt; [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> &gt; 
> &gt; attached base packages:
> &gt; [1] grid      stats     graphics  grDevices utils     datasets  methods  
> base     
> &gt; 
> &gt; other attached packages:
> &gt; [1] Starr_1.2.0        affxparser_1.18.0  affy_1.24.0        Ringo_1.10.0 
>      Matrix_0.999375-31 lattice_0.17-26   
> &gt; [7] limma_3.2.1        RColorBrewer_1.0-2 Biobase_2.6.0     
> &gt; 
> &gt; loaded via a namespace (and not attached):
> &gt; [1] affyio_1.14.0        annotate_1.24.0      AnnotationDbi_1.8.0 
> DBI_0.2-4            genefilter_1.28.0   
> &gt; [6] MASS_7.3-3           preprocessCore_1.8.0 pspline_1.0-13      
> RSQLite_0.7-3        splines_2.10.0      
> &gt; [11] survival_2.35-7      xtable_1.5-5