[BioC] removing SNP probes from Drosophila 2 microarray

Hervé Pagès hpages at fhcrc.org
Tue Jul 13 09:36:23 CEST 2010


Hi Eli,

On 07/06/2010 08:09 AM, Eli Moss wrote:
> Hello all,
>
>   I'm interested in finding probes with lower expression level due to a SNP mismatch and removing them, hopefully giving me a better measure of expression level even in those genes with sequences modified by a hybridization experiment.  I'm thinking of doing this by finding the changes between probe intensities between control and experimental arrays, which should be normally distributed, and removing all those probes outside some number of standard deviations from the mean.  My question is this: is there a package that does something like this, or do I need to write it myself?
>

This sounds like a complicated question to me and I'm not sure
I understand the method you are suggesting (I'm not a statistician).
But FWIW here is a simple method for determining the probes or probe
sets that hit at least a SNP in dbSNP. It probably doesn't really
help for your particular problem because I guess you want to know
the probes that hit a SNP in your sample, not in dbSNP, but it's a
good illustration of the kind of things that can easily be achieved
by combining tools from the GenomicFeatures, Biostrings, BSgenome
data packages and SNPlocs packages. Unfortunately there is no SNPlocs
package for Drosophila at the moment so I'm using Human.

The idea is to match the probes against the Human transcriptome,
then to alter the transcriptome by injecting the SNPs from dbSNP
in it and to match the probes against the altered transcriptome.
The probes that have less matches the 2nd time are those that hit
at least 1 SNP.

### Load your probes and store them in a PDict object for fast
### matching.
library(hgu95av2probe)
library(Biostrings)
probes <- DNAStringSet(hgu95av2probe)
pdict <- PDict(probes)

### Retrieve the exon/transcript locations for hg19 from UCSC.
library(GenomicFeatures)
txdb <- makeTranscriptDbFromUCSC("hg19")

### Extract the transcriptome.
library(BSgenome.Hsapiens.UCSC.hg19)
tx_seqs <- extractTranscriptsFromGenome(Hsapiens, txdb)

### Use vwhichPDict() from the Biostrings package to find the
### probes that hit each transcript.
tx2probe <- vwhichPDict(pdict, tx_seqs)
names(tx2probe) <- names(tx_seqs)

### Reverse the tx2probe map.
reverseTx2ProbeMap <- function(tx2probe, nprobe)
{
     vals <- rep(names(tx2probe), elementLengths(tx2probe))
     keys <- unlist(tx2probe)
     tmp <- split(vals, keys)
     probe2tx <- vector("list", length=nprobe)
     probe2tx[as.integer(names(tmp))] <- tmp
     probe2tx
}
probe2tx <- reverseTx2ProbeMap(tx2probe, length(probes))

### Repeat the above but now with the SNPs from dbSNP injected in the
### reference genome as IUPAC ambiguity letters.
library(SNPlocs.Hsapiens.dbSNP.20100427)
SNP_Hsapiens <- injectSNPs(Hsapiens, "SNPlocs.Hsapiens.dbSNP.20100427")
SNP_tx_seqs <- extractTranscriptsFromGenome(SNP_Hsapiens, txdb)
SNP_tx2probe <- vwhichPDict(pdict, SNP_tx_seqs)
names(SNP_tx2probe) <- names(SNP_tx_seqs)
SNP_probe2tx <- reverseTx2ProbeMap(SNP_tx2probe, length(probes))

### A sanity check.
all(elementLengths(SNP_probe2tx) <= elementLengths(probe2tx))

### Index of rows in hgu95av2probe that hit at least a SNP in dbSNP.
idx <- which(elementLengths(SNP_probe2tx) < elementLengths(probe2tx))

### Corresponding probes.
probes[idx]

### Corresponding probeset ids.
unique(hgu95av2probe$Probe.Set.Name[idx])

Cheers,
H.

> Thank you,
> Eli
>
>
> --
> Eli Moss
> Brown University, Computational Biology
> +1 (413) 658-4227
> Eli_Moss at Brown.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list