[BioC] Aligning short reads to a single gene

Fri Feb 8 09:30:02 CET 2013

Dear all,
 a sanity check of my RNA-Seq experiment is driving me crazy.

I want to verify that the GFP reporter gene is expressed in my control
samples. This reporter gene, of course, is not present in the mouse
reference genome but only in my transgenic mice.
Therefore, I thought to align the short reads of my RNA-Seq fastq files to
the GFP gene sequence to verify its mRNA presence in my samples.

I came out with this naïve solution:

library(ShortRead)

## read fastq file
reads<- readFastq('filename.fastq')

## get the sequences
seq<- sread(reads)

## keep only the sequences without 'N'
alf <- alphabetFrequency( sread(reads))
clean<- seq[alf[,"N"] == 0]

## make a dictionary of those sequences
pdict0<- PDict(clean)

## match the dictionary of my sequences to the GFP gene (~800 bp)
myseq<-DNAString("ATG...TAA")   ## here the GFP gene sequence
mysearch <- matchPDict(pdict0, myseq, max.mismatch=0)   ## exact match
count_index <- sum(countIndex(mysearch))
count_index   ## the total number of alignments

Any suggestion for a nicer and more efficient solution?

Thank you in advance for your help

Ugo