[BioC] limma_3.17.23 - missing ILMN identifiers in EList objects after read.ilmn

Kemal Akat kakat at mail.rockefeller.edu
Wed Oct 9 20:39:45 CEST 2013


Dear colleagues,

I am currently analyzing a Illumina Mouse v2 bead array dataset using limma and ran across an error I don't quite understand. I came across this error when trying to annotate the differentially expressed genes later on in
the analysis. The problem seems to stem from empty strings in the vector I provide to retrieve the annotation info. But I don't understand how this can happen in the first place.

The probe and control profiles were exported from GenomeStudio without background correction and normalization.

Here is the code I ran:

R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
R> y = neqc(x)
R> expressed = rowSums(y$other$Detection < 0.05) > 4
R> y = y[expressed, ]
R> ids = rownames(y)
R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))

Error in unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) : 
  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in FUN(c("ILMN_2735294", "ILMN_2417611", "ILMN_2545897", "ILMN_2762289",  : 
  attempt to use zero-length variable name
Calls: mget ... as.list -> as.list -> .formatList -> lapply -> lapply -> FUN

R> traceback()
1: unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))

R> ids[ids == ""]
  [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
 [55] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[109] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[163] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[217] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[271] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[325] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[379] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[433] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[487] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[541] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[595] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[649] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[703] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[757] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[811] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[865] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[919] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[973] "" ""

So there seem to be 974 empty strings in the row names, but there is nothing like that in the original data file, and in addition this shouldn't be working in R in the first place?

Here is how the EListRaw object looks like after reading it into R.

R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
R> x
An object of class "EListRaw"
$source
[1] "illumina"

$E
             9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
ILMN_2735294        420.8        401.8        395.8        422.9        360.1        358.5        420.7        327.1        178.8        343.4        425.5
ILMN_2417611        323.8        280.2        294.1        315.5        542.5        301.0        398.0        133.7        235.9        382.0        512.7
ILMN_2545897         98.3        109.2        128.0        124.5        111.3        102.6        110.2        106.6         87.2        104.6        101.8
ILMN_2762289         91.7         88.3         94.2         95.5         88.1         81.2         88.5         88.0         79.4         85.3         84.5
ILMN_1248788         87.6         84.7         92.0         92.9         85.9         84.0         93.8         86.9         77.5         84.9         86.3
             9379087022_F
ILMN_2735294        322.0
ILMN_2417611        185.7
ILMN_2545897        107.8
ILMN_2762289         88.8
ILMN_1248788         85.1
46250 more rows ...

$genes
       TargetID  Status
1 0610005A07RIK regular
2 0610005C13RIK regular
3 0610005H09RIK regular
4    0610005I04 regular
5 0610005K03RIK regular
46250 more rows ...

$other
$Detection
             9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
ILMN_2735294      0.00000      0.00000       0.0000       0.0000       0.0000       0.0000      0.00000       0.0000      0.00000      0.00000      0.00000
ILMN_2417611      0.00000      0.00000       0.0000       0.0000       0.0000       0.0000      0.00000       0.0000      0.00000      0.00000      0.00000
ILMN_2545897      0.08974      0.00321       0.0000       0.0000       0.0000       0.0000      0.00107       0.0000      0.00214      0.00214      0.00107
ILMN_2762289      0.34402      0.49359       0.1998       0.1827       0.6068       0.9220      0.71047       0.4776      0.27350      0.58654      0.77991
ILMN_1248788      0.76603      0.86004       0.3472       0.3718       0.8440       0.6645      0.21902       0.6004      0.58120      0.63675      0.53419
             9379087022_F
ILMN_2735294       0.0000
ILMN_2417611       0.0000
ILMN_2545897       0.0000
ILMN_2762289       0.3440
ILMN_1248788       0.7949
46250 more rows ...

$Avg_NBEADS
             9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
ILMN_2735294           51           63           58           57           36           46           49           60           62           50           58
ILMN_2417611           44           56           46           51           66           51           42           66           40           47           57
ILMN_2545897           51           69           45           67           47           39           44           56           59           43           50
ILMN_2762289           48           49           53           59           43           55           47           49           54           41           53
ILMN_1248788           43           42           29           38           39           42           36           36           29           31           45
             9379087022_F
ILMN_2735294           50
ILMN_2417611           56
ILMN_2545897           58
ILMN_2762289           42
ILMN_1248788           38
46250 more rows ...

Now looking at the end of the file:

R> tail(x$E)
 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F
         92.2         92.6         92.6         93.8         92.1         86.9         91.4         85.7         78.9         86.5         89.0         91.7
         89.2         85.7         92.3         89.9         85.9         83.7         91.3         89.5         76.6         91.4         86.3         85.8
         89.8         85.5         92.7         92.1         92.7         87.3         90.1         86.2         79.1         83.7         86.4         84.9
         96.9         88.9         92.4         94.6         90.7         87.9         96.2         85.6         78.0         82.0         86.4         84.1
         87.8         83.5         85.9         90.2         81.6         81.5         92.5         83.8         73.1         80.6         86.1         86.8
         89.8         87.4         87.1         89.6         88.1         84.4         91.9         85.7         80.5         88.3         86.8         86.3


R> sessionInfo()
R Under development (unstable) (2013-06-26 r63071)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xtable_1.7-1              vsn_3.29.1                reshape2_1.2.2            ratr_1.0                  pheatmap_0.7.4            illuminaMousev2.db_1.18.0
 [7] org.Mm.eg.db_2.9.0        GOstats_2.27.1            graph_1.39.3              ggplot2_0.9.3.1           edgeR_3.3.8               limma_3.17.23            
[13] codetools_0.2-8           Category_2.27.3           GO.db_2.9.0               RSQLite_0.11.4            DBI_0.2-7                 Matrix_1.0-12            
[19] lattice_0.20-15           Biostrings_2.29.19        XVector_0.1.4             IRanges_1.19.37           AnnotationDbi_1.23.23     Biobase_2.21.7           
[25] BiocGenerics_0.7.5        knitr_1.4.1               setwidth_1.0-3           

loaded via a namespace (and not attached):
 [1] affy_1.39.2            affyio_1.29.0          annotate_1.39.0        AnnotationForge_1.3.22 BiocInstaller_1.11.4   colorspace_1.2-2       dichromat_2.0-0       
 [8] digest_0.6.3           evaluate_0.4.7         formatR_0.9            genefilter_1.43.0      grid_3.1.0             GSEABase_1.23.0        gtable_0.1.2          
[15] highr_0.2.1            labeling_0.2           MASS_7.3-26            munsell_0.4            plyr_1.8               preprocessCore_1.23.0  proto_0.3-10          
[22] RBGL_1.37.2            RColorBrewer_1.0-5     scales_0.2.3           stats4_3.1.0           stringr_0.6.2          survival_2.37-4        tools_3.1.0           
[29] XML_3.98-1.1           zlibbioc_1.7.0        
R> 

Any help and explanations appreciated!

Cheers,
Kemal
--
Kemal Akat
Laboratory of RNA Molecular Biology
The Rockefeller University
1230 York Avenue, Box #186
New York, NY 10065



More information about the Bioconductor mailing list