[BioC] Agi4x44PreProcess - Replicated genes

Francois Pepin fpepin at cs.mcgill.ca
Tue Nov 17 02:06:51 CET 2009


Hi Neel,

There are a couple of issues here, that I can see.

One is that you are not using the proper annotation packages. 
org.Dr.eg.db is an organism package and would not contain the probe 
information that would be expected by the functions you are calling. You 
will have to use the annotation package for the chip instead. IO think 
you had this right the first time around, why did you change the 
annotation library?

In addition, the Agi4x44PreProcess package has a rather narrow scope and 
many functions only work on 4x44 Agilent mouse and human whole genome 
arrays (hgug4112a and mgug4122a). It would actually be easy but possibly 
time-consuming for the package authors to handle other chip types.

As a minor point, you are also calling internal methods, such as 
ensembl.htmlpage. This is generally not recommended as they are less 
documented, are usually not as robust and can change wildly between 
versions.

Your call to ensembl.htmlpage also does not use the proper arguments, as 
the 3rd argument should be the file name, not the 2nd.

As it stands, you have 2 main options. The first would be to try to 
convince the maintainer of the Agi4x44PreProcess package to handle other 
chip types. The second is use another package to do the quality control. 
arrayQualityMetrics contains a lot of the basic tools, and limma 
contains some useful functions also.

Hope this helps,

Francois

On 11/16/2009 07:33 PM, Neel Aluru wrote:
> Hello,
>
> I am making progress in learning R but I must admit that I am really slow and without all your help I would have given up on this. I still have some recurring troubles with Agi4x44PreProcess. This time I am having issues with Replicated genes (genes.rpt.agi). It looks like I am missing something. I have posted the session info here and highlighted the problematic ones in "red". I really appreciate your help.
>
> Thank you very much in advance,
>
> Sincerely, Neel
>
> [R.app GUI 1.29 (5464) i386-apple-darwin8.11.1]
>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite()
> Using R version 2.9.2, biocinstall version 2.4.13.
> Installing Bioconductor version 2.4 packages:
>   [1] "affy"         "affydata"     "affyPLM"      "annaffy"      "annotate"     "Biobase"      "biomaRt"
>   [8] "Biostrings"   "DynDoc"       "gcrma"        "genefilter"   "geneplotter"  "hgu95av2.db"  "limma"
> [15] "marray"       "multtest"     "vsn"          "xtable"       "affyQCReport"
> Please wait...
>> library(org.Dr.eg.db)
> Loading required package: AnnotationDbi
> Loading required package: Biobase
>> setwd("/Users/Neel/agilent")
>> getwd()
> [1] "/Users/Neel/agilent"
>> library("Agi4x44PreProcess")
> Loading required package: limma
> Loading required package: annotate
> Loading required package: genefilter
>> targets=read.targets(infile="infile.txt")
>
> Target File
>            X  FileName Treatment GErep
> conta cont1 conta.txt   control     1
> contb cont2 contb.txt   control     2
> contc cont3 contc.txt   control     3
> contd cont4 contd.txt   control     4
> pcba   pcb1  pcba.txt       pcb     1
> pcbb   pcb2  pcbb.txt       pcb     2
> pcbc   pcb3  pcbc.txt       pcb     3
> pcbd   pcb4  pcbd.txt       pcb     4
>
>> aa=read.AgilentFE(targets, makePLOT=FALSE)
> Read conta.txt
> Read contb.txt
> Read contc.txt
> Read contd.txt
> Read pcba.txt
> Read pcbb.txt
> Read pcbc.txt
> Read pcbd.txt
>
>    RGList:
> 	dd$R:	'gProcessedSignal'
> 	dd$G:	'gMeanSignal'
> 	dd$Rb:	'gBGMedianSignal'
> 	dd$Gb:	'gBGUsed'
>
>> aaNORM = BGandNorm(aa, BGmethod = "half", NORMmethod = "quantile", foreground = "MeanSignal", background = "BGMedianSignal", offset = 50, makePLOTpre = FALSE, makePLOTpost = FALSE)
> Loading required package: vsn
> BACKGROUND CORRECTION AND NORMALIZATION
>
> 	foreground: MeanSignal
> 	background: BGMedianSignal
>
> 	BGmethod:	 half
> 	NORMmethod:	 quantile
> 	OUTPUT in log-2 scale
>> CV.rep.probes(aa, "org.Dr.eg.db", foreground="MeanSignal", raw.data= TRUE, writeR=FALSE,targets)
>
> ------------------------------------------------------
> Non-CTRL Replicated probes
> 	foreground:  MeanSignal
> 		FILTERING BY ControlType FLAG
> 		RAW DATA: PROBES AFTER ControlType FILTERING:  42990
>
> ------------------------------------------------------
> 	REPLICATED NonCtrl Probes 21495
> 	UNIQUE probes 21495
> 	DISTRIBUTION OF REPLICATED NonControl Probes
> reps
>      1
> 21495
> 	# REPLICATED (redundant) probeNames 21495
> ------------------------------------------------------
> MEDIAN % CV
> conta contb contc contd  pcba  pcbb  pcbc  pcbd
> 2.378 0.963 1.233 1.997 2.439 1.282 1.438 2.104
>
>> genes.rpt.agi(aa, "org.Dr.eg", raw.data = TRUE, WRITE.html = FALSE, REPORT = FALSE)
>
> GENE SETS: same genes interrogated by different probes
> 		FILTERING BY ControlType FLAG
> 		RAW DATA: PROBES AFTER ControlType FILTERING:  42990
>
> 	INPUT DATA: RAW
> 	CHIP: org.Dr.eg
>
> 	PROBE SETS (NON-CTRL prob rep. x 10):	 21495
> Error in lookUp(PROBE_ID, annotation.package, "SYMBOL") :
>    No keys provided  (Can anyone explain to me what keys means in R?)
>
>
>> PROBE_ID = aa$ProbeUID$ProbeName
>> GENE_ID = unlist(lookUp(PROBE_ID, "org.Dr.eg.db", "org.Dr.egACCNUM") )
>
> Error in lookUp(PROBE_ID, "org.Dr.eg.db", "org.Dr.egACCNUM") :
>    No keys provided
>
>> head<- c("PROBE ID","org.Dr.egACCNUM","SYMBOL")
>> ensembl.htmlpage(PROBE_ID,filename,"org.Dr.eg", title, table.head=head,table.center = TRUE)
>
> Error in match.arg(annotation.package, c("hgug4112a.db", "mgug4122a.db",  :
>    'arg' should be one of “hgug4112a.db”, “mgug4122a.db”, “notAnnPack”
>
>> ensembl.htmlpage(PROBE_ID,filename,"org.Dr.eg.db", title, table.head=head,table.center = TRUE)
>
> Error in file(filename, "w") : cannot open the connection
> In addition: Warning message:
> In file(filename, "w") : cannot open file 'org.Dr.eg.db': Is a directory
>
> (Do you think I should create annotation package to solve this?)
>
>
>
>
>
> Neel Aluru
> Postdoctoral Scholar
> Biology Department
> Woods Hole Oceanographic Institution
> Woods Hole, MA 02543
> USA
> 508-289-3607
>
>
>
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list