[BioC] IlluminaMousev2.db probe quality information questions?

Mark Dunning mark.dunning at gmail.com
Thu Mar 1 17:48:20 CET 2012


Hi Lourdes,

Sorry for taking so long to get back to you. Went away for a few days
and somehow managed to miss your message

Thanks for your interest in the packages! The probe quality scores are
derived from our mapping of probes to the genome and the transcriptome
using an in-house perl script. The *'s indicate issues in
consolidating the genomic and transcriptomic matches. Here is the full
explanation;


"Perfect/Good*** no CDS annotation - this can occur where there all the
transcript alignment matches are to the reverse strand and/or are GenBank
entries for which we have no 5pUTR/3pUTR/CDS annotation."

i.e the probe was found to match a transcript, but there is
insufficient information to class it as 3pUTR/5pUTR. The transcript
may be unreliable.

Perfect/Good**** mismatches for transcript alignment to the genome -
mismatches for transcript alignments to the genome are taken from the UCSC
annotations tables refSeqAli and all_mrna; **** is attached to the probe
quality is Perfect or Good and the genomics coordinates for the best match
from a BLAST search against the transcript databases and that from a BLAST
search against the reference genome differ and there is a mismatch in the
transcript alignment to the genome.

i,e the probe matches a transcript, but the transcript does not map to
the genomic location that we expect.


The missing Probe Quality values for those probes are accidental. The
source file I use to compile the annotation packages is as follows

grep ILMN_1229593 Annotation_Illumina_Mouse_WG-6_V2_mm9_Sept2011.txt

ILMN_1229593	AACTGGCCCACCTTCAACACTCCCTCTAGGCACCCAGACCTCTAGTGGCA	50	chr15:63942585:63942634:-	15qD1		0		1-50
||||||||||||||||||||||||||||||||||||||||||||||||||	50	100	100						NM_010026	1
of 1 (Asap1)	uc007vzk.1 uc007vzj.1 uc007vzi.1 uc007vzh.1	4 of 6
(Asap1)	BC094581 BC048818 BC002201 AK122477 AF075461 AK147689	6 of 381
(Asap1)6 X 6 6 6 6 6 6 6 6 7	ENSMUST00000110115 ENSMUST00000023008	2
of 3 (ENSMUSG00000022377)	65301463 63101607 28981428 12805456 28972685
4063613 74188670		NP_034156.2      	Q9QWY8 Q9QWY8	No	1-50
||||||||||||||||||||||||||||||||||||||||||||||||||	50	100	100	U92478	1-50
|||||||||||||||||||||||||
||||||||||||||||||||||||	50	98	98	Asap1		ENSMUSG00000022377	Mm.27723613196	ArfGAP
with SH# domain, ankyrin repeat and PH
domain1			Yes	Transcriptomic	Yes	58	0	Perfect		006280286

grep ILMN_2694153 Annotation_Illumina_Mouse_WG-6_V2_mm9_Sept2011.txt
ILMN_2694153	GTTTAGATGAGTGGGTTTGTACATCTTATGGCGAGTGGCCACCCCTGAGA	50	chr15:63920345:63920394:-	15qD1		0		1-50
||||||||||||||||||||||||||||||||||||||||||||||||||	50	100	100						NM_010026	1
of 1 (Asap1)	uc007vzm.1 uc007vzl.1 uc007vzk.1 uc007vzj.1 uc007vzi.1
uc007vzh.1	6 of 6 (Asap1)	U92478 BC094581 BC048818 BC002201 AK122477
AF075462 AF075461 AK166056 AK159048 AK146545 BB821218 AK147689	11 of
381 (Asap1)	1 1 1 X 1 1 1 1 1 1 1 1 1 1 X 1 X X 1	ENSMUST00000110114
ENSMUST00000110115 ENSMUST00000023008	3 of 3
(ENSMUSG00000022377)	65301463 1928965 63101607 28981428 12805456
28972685 4063615 4063613 74141548 74186632 74138896 16993847
74188670		NP_034156.2            	Q9QWY8 Q9QWY8 Q9QWY8 Q9QWY8	No	1-50
||||||||||||||||||||||||||||||||||||||||||||||||||	50	100	100						Asap1		ENSMUSG00000022377	Mm.27723613196	ArfGAP
with SH# domain, ankyrin repeat and PH
domain1			Yes	Transcriptomic	Yes	50	0	Perfect		001010528

There is a # character in the description and by default R thinks that
everything that follows is a comment and so doesn't read them in. I
shall correct this in future versions of the annotation. Thanks for
spotting this. Both probes are Perfect btw.

Regards,

Mark


On Tue, Feb 14, 2012 at 7:01 PM, Lourdes Peña Castillo
<lourdes.pena at gmail.com> wrote:
> Hello,
>
> I am using the re-annotation of Illumina probe sequences available in the
>  IlluminaMousev2.db (great package!), and I have two questions (please see
> code below as well):
>
> 1) Is there any difference between Good and Good*** or Perfect and
> Perfect**** probe quality?
>
> 2) I noticed there are two probes re-annotated to an EntrezID without probe
> quality, why would this be?
>
> Thanks!
>
> Lourdes
>
>> library("illuminaMousev2.db")
>
>> x <- illuminaMousev2ENTREZREANNOTATED
>
>> mapped_probes <- mappedkeys(x)
>
>> xx <- as.list(x[mapped_probes])
>
>> probe_EntrezID_re <- unlist(xx)
>
>>
>
>> x <- illuminaMousev2PROBEQUALITY
>
>> mapped_probes <- mappedkeys(x)
>
>> # Convert to a list
>
>> xx <- as.list(x[mapped_probes])
>
>> probe_quality_re <- unlist(xx)
>
>>
>
>> table(probe_quality_re[intersect(names(probe_EntrezID_re),
> names(probe_quality_re))])
>
>
>        Bad        Good     Good***    Good****    No match     Perfect
>  Perfect*** Perfect****
>
>       3657         996          38         302          79       31819
>   1719        1047
>
>>
>
>> setdiff(names(probe_EntrezID_re), names(probe_quality_re))
>
> [1] "ILMN_1229593" "ILMN_2694153"
>
>> probe_quality_re[c("ILMN_1229593", "ILMN_2694153")]
>
> <NA> <NA>
>
>  NA   NA
>
>>
>
>> sessionInfo()
>
> R version 2.14.1 (2011-12-22)
>
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
>
> locale:
>
> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
>
> attached base packages:
>
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> base
>
>
> other attached packages:
>
>  [1] gplots_2.10.1             KernSmooth_2.23-7         caTools_1.12
>       bitops_1.0-4.1
>
>  [5] gdata_2.8.2               gtools_2.6.2              limma_3.10.2
>       illuminaMousev2.db_1.12.1
>
>  [9] org.Mm.eg.db_2.6.4        RSQLite_0.11.1            DBI_0.2-5
>        AnnotationDbi_1.16.15
>
> [13] Biobase_2.14.0            BiocInstaller_1.2.1
>
>
> loaded via a namespace (and not attached):
>
> [1] IRanges_1.12.6 tools_2.14.1
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list