[BioC] Analysis of Affymetrix Human Gene 2.0 ST arrays

Fri Nov 29 12:18:50 CET 2013

Dear all,

 I am analyzing a set of Affymetrix Human Gene 2.0 ST arrays, this is my first time working with this type of arrays so I have a few general questions. I would very much appreciate any advice you could give.

(1) I have obtained different lists of differentially expressed genes (using eBayes() from limma). In those lists, some control transcripts are popping up (i.e normgene -> intron category among other categories). I was not expecting this type of transcripts at this point. In theory after normalization, no control transcripts should appear, am I right? Have you experienced this? 
I have read that one possibility is to use getMainProbes before topTable selection but I wonder if there could be something wrong from the beginning with my normalization process (I have used rma() â€“ transcript level - from oligo). What is your opinion?

(2) This type of arrays also includes lincRNA transcripts and I am interested in considering them for my analysis. The thing is that I am using hugene20sttranscriptcluster.db for annotation and these lincRNA are not included. Would this library be able to handle them? 

(3) I tried to make my own annotation package thru makeDBPackage based on .csv annotation file from Affy but I got an errorâ€¦:  Error in `[.data.frame`(csvFile, , GenBank IDName) : undefined columns selected
I have already read in this mailing list that makeDBPackage may expect a HGU133plus2 annotation â€œstyleâ€. Would the library annotationForge be able to handle this?

Many thanks in advance for any help!

MarÃa Maqueda

Biomedical Engineering Research Centre (CREB)
Universitat PolitÃ¨cnica de Catalunya (UPC)

 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] human.db0_2.9.0                       AnnotationForge_1.2.2                
 [3] hugene20sttranscriptcluster.db_2.12.1 org.Hs.eg.db_2.9.0                   
 [5] AnnotationDbi_1.22.6                  BiocInstaller_1.12.0                 
 [7] limma_3.16.8                          pd.hugene.2.0.st_3.8.0               
 [9] oligo_1.24.2                          Biobase_2.20.1                       
[11] oligoClasses_1.22.0                   BiocGenerics_0.6.0                   
[13] RSQLite_0.11.4                        DBI_0.2-7                            

loaded via a namespace (and not attached):
 [1] affxparser_1.32.3     affyio_1.28.0         annotate_1.38.0      
 [4] Biostrings_2.28.0     bit_1.1-10            codetools_0.2-8      
 [7] ff_2.2-12             foreach_1.4.1         genefilter_1.42.0    
[10] GenomicRanges_1.12.5  IRanges_1.18.4        iterators_1.0.6      
[13] preprocessCore_1.22.0 splines_3.0.1         stats4_3.0.1         
[16] survival_2.37-4       tools_3.0.1           XML_3.98-1.1         
[19] xtable_1.7-1          zlibbioc_1.6.0 

--
Sent via the guest posting facility at bioconductor.org.