[BioC] Agi4x44PreProcess - Replicated genes
Francois Pepin
fpepin at cs.mcgill.ca
Tue Nov 17 02:06:51 CET 2009
Hi Neel,
There are a couple of issues here, that I can see.
One is that you are not using the proper annotation packages.
org.Dr.eg.db is an organism package and would not contain the probe
information that would be expected by the functions you are calling. You
will have to use the annotation package for the chip instead. IO think
you had this right the first time around, why did you change the
annotation library?
In addition, the Agi4x44PreProcess package has a rather narrow scope and
many functions only work on 4x44 Agilent mouse and human whole genome
arrays (hgug4112a and mgug4122a). It would actually be easy but possibly
time-consuming for the package authors to handle other chip types.
As a minor point, you are also calling internal methods, such as
ensembl.htmlpage. This is generally not recommended as they are less
documented, are usually not as robust and can change wildly between
versions.
Your call to ensembl.htmlpage also does not use the proper arguments, as
the 3rd argument should be the file name, not the 2nd.
As it stands, you have 2 main options. The first would be to try to
convince the maintainer of the Agi4x44PreProcess package to handle other
chip types. The second is use another package to do the quality control.
arrayQualityMetrics contains a lot of the basic tools, and limma
contains some useful functions also.
Hope this helps,
Francois
On 11/16/2009 07:33 PM, Neel Aluru wrote:
> Hello,
>
> I am making progress in learning R but I must admit that I am really slow and without all your help I would have given up on this. I still have some recurring troubles with Agi4x44PreProcess. This time I am having issues with Replicated genes (genes.rpt.agi). It looks like I am missing something. I have posted the session info here and highlighted the problematic ones in "red". I really appreciate your help.
>
> Thank you very much in advance,
>
> Sincerely, Neel
>
> [R.app GUI 1.29 (5464) i386-apple-darwin8.11.1]
>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite()
> Using R version 2.9.2, biocinstall version 2.4.13.
> Installing Bioconductor version 2.4 packages:
> [1] "affy" "affydata" "affyPLM" "annaffy" "annotate" "Biobase" "biomaRt"
> [8] "Biostrings" "DynDoc" "gcrma" "genefilter" "geneplotter" "hgu95av2.db" "limma"
> [15] "marray" "multtest" "vsn" "xtable" "affyQCReport"
> Please wait...
>> library(org.Dr.eg.db)
> Loading required package: AnnotationDbi
> Loading required package: Biobase
>> setwd("/Users/Neel/agilent")
>> getwd()
> [1] "/Users/Neel/agilent"
>> library("Agi4x44PreProcess")
> Loading required package: limma
> Loading required package: annotate
> Loading required package: genefilter
>> targets=read.targets(infile="infile.txt")
>
> Target File
> X FileName Treatment GErep
> conta cont1 conta.txt control 1
> contb cont2 contb.txt control 2
> contc cont3 contc.txt control 3
> contd cont4 contd.txt control 4
> pcba pcb1 pcba.txt pcb 1
> pcbb pcb2 pcbb.txt pcb 2
> pcbc pcb3 pcbc.txt pcb 3
> pcbd pcb4 pcbd.txt pcb 4
>
>> aa=read.AgilentFE(targets, makePLOT=FALSE)
> Read conta.txt
> Read contb.txt
> Read contc.txt
> Read contd.txt
> Read pcba.txt
> Read pcbb.txt
> Read pcbc.txt
> Read pcbd.txt
>
> RGList:
> dd$R: 'gProcessedSignal'
> dd$G: 'gMeanSignal'
> dd$Rb: 'gBGMedianSignal'
> dd$Gb: 'gBGUsed'
>
>> aaNORM = BGandNorm(aa, BGmethod = "half", NORMmethod = "quantile", foreground = "MeanSignal", background = "BGMedianSignal", offset = 50, makePLOTpre = FALSE, makePLOTpost = FALSE)
> Loading required package: vsn
> BACKGROUND CORRECTION AND NORMALIZATION
>
> foreground: MeanSignal
> background: BGMedianSignal
>
> BGmethod: half
> NORMmethod: quantile
> OUTPUT in log-2 scale
>> CV.rep.probes(aa, "org.Dr.eg.db", foreground="MeanSignal", raw.data= TRUE, writeR=FALSE,targets)
>
> ------------------------------------------------------
> Non-CTRL Replicated probes
> foreground: MeanSignal
> FILTERING BY ControlType FLAG
> RAW DATA: PROBES AFTER ControlType FILTERING: 42990
>
> ------------------------------------------------------
> REPLICATED NonCtrl Probes 21495
> UNIQUE probes 21495
> DISTRIBUTION OF REPLICATED NonControl Probes
> reps
> 1
> 21495
> # REPLICATED (redundant) probeNames 21495
> ------------------------------------------------------
> MEDIAN % CV
> conta contb contc contd pcba pcbb pcbc pcbd
> 2.378 0.963 1.233 1.997 2.439 1.282 1.438 2.104
>
>> genes.rpt.agi(aa, "org.Dr.eg", raw.data = TRUE, WRITE.html = FALSE, REPORT = FALSE)
>
> GENE SETS: same genes interrogated by different probes
> FILTERING BY ControlType FLAG
> RAW DATA: PROBES AFTER ControlType FILTERING: 42990
>
> INPUT DATA: RAW
> CHIP: org.Dr.eg
>
> PROBE SETS (NON-CTRL prob rep. x 10): 21495
> Error in lookUp(PROBE_ID, annotation.package, "SYMBOL") :
> No keys provided (Can anyone explain to me what keys means in R?)
>
>
>> PROBE_ID = aa$ProbeUID$ProbeName
>> GENE_ID = unlist(lookUp(PROBE_ID, "org.Dr.eg.db", "org.Dr.egACCNUM") )
>
> Error in lookUp(PROBE_ID, "org.Dr.eg.db", "org.Dr.egACCNUM") :
> No keys provided
>
>> head<- c("PROBE ID","org.Dr.egACCNUM","SYMBOL")
>> ensembl.htmlpage(PROBE_ID,filename,"org.Dr.eg", title, table.head=head,table.center = TRUE)
>
> Error in match.arg(annotation.package, c("hgug4112a.db", "mgug4122a.db", :
> 'arg' should be one of “hgug4112a.db”, “mgug4122a.db”, “notAnnPack”
>
>> ensembl.htmlpage(PROBE_ID,filename,"org.Dr.eg.db", title, table.head=head,table.center = TRUE)
>
> Error in file(filename, "w") : cannot open the connection
> In addition: Warning message:
> In file(filename, "w") : cannot open file 'org.Dr.eg.db': Is a directory
>
> (Do you think I should create annotation package to solve this?)
>
>
>
>
>
> Neel Aluru
> Postdoctoral Scholar
> Biology Department
> Woods Hole Oceanographic Institution
> Woods Hole, MA 02543
> USA
> 508-289-3607
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list