[BioC] QuasR special case alignment

Michael Stadler michael.stadler at fmi.ch
Thu May 2 08:38:09 CEST 2013


Hi Ugo,

I can see one problem in the auxiliaries.txt file: You are referring the
same sequence file (seq.fa) twice, so all auxiliary alignments are done
twice.

The speed of QuasR is primarly defined by the speed of bowtie/SpliceMap.
The fact that the alignments against chr1 are fast may be due to only
few reads producing hits on chr1, or at last much fewer hits than
against the auxiliaries.

A way to speed up the analysis would be to do unspliced alignments
(spliceAlignment=FALSE), which may not be necessary if your reporter
genes in seq.fa are spliced cDNAs. Also, it would help to use more than
just 4 CPU cores, if you have additional hardware available to you.

Michael

On 26.04.2013 14:58, Ugo Borello wrote:
> Thank you Michael,
> The reporter genes are not part of the genome, so I am aligning my short
> reads to the genome (actually chr1only in the following test) and to a fasta
> file (seq.fa) containing the sequences of the two reporter genes.
> But, while alignment to chr1 was relatively fast, the alignment to the fasta
> file it is taking forever.
> Here the code:
> 
> library(QuasR)
> library(parallel)
> 
> cl <- makeCluster(4)
> sampleFile <- "test.txt"
> genomeName <- "chr1.fa"
> auxFile <- "auxiliaries.txt"
> 
> proj1 <- qAlign(sampleFile, genome= genomeName, auxiliaryFile= auxFile,
> splicedAlignment=TRUE, clObj=cl)
> 
> 
> text.txt is formatted this way:
> FileName        SampleName
> 31omo.fastq    Sample1
> 
> 
> auxiliaries.txt is formatted this way:
> FileName    AuxName
> seq.fa          egfp
> seq.fa          cre
> 
> 
> Am I missing something here? Is there a faster way to complete my analysis?
> 
> Thank you in advance for your help
> 
> Ugo
> 
> 
>> From: Michael Stadler <michael.stadler at fmi.ch>
>> Date: Fri, 26 Apr 2013 08:40:50 +0200
>> To: <bioconductor at r-project.org>
>> Cc: <ugo.borello at inserm.fr>
>> Subject: Re: [BioC] QuasR special case alignment
>>
>> Dear Ugo,
>>
>> One way to accomplish would be to put your two reporter genes into a
>> fasta file, and use this as your "genome" (see ?qAlign or the vignette
>> for examples of genomes in fasta format).
>>
>> However, I think it would be better to align against the full genome, as
>> this will avoid certain problems, such as reads being suboptimally
>> aligned to your reporter genes which would align perfectly to some other
>> genomic locus.
>>
>> Assuming that the reporter genes are part of the genome, you could then
>> count alignments using qCount() with a GRanges query containing the
>> ranges of your reporter genes.
>>
>> If the reporter genes are not part of the genome, you can provide your
>> reporter gene sequences (as a fasta file) to qAlign, and all reads not
>> aligning to the genome will be further aligned to them. The format of
>> the required auxiliary file is very simple and described in section 5.2
>> of the vignette.
>>
>> You can open the QuasR vignette using:
>> library(QuasR)
>> vignette("QuasR-Overview")
>>
>> Best wishes,
>> Michael
>>
>> On 25.04.2013 16:52, Ugo Borello wrote:
>>> Good morning,
>>>
>>> I am doing RNA-Seq analysis and I would like to perform an exploratory
>>> analysis to first verify that 2 reporter genes are expressed in my samples.
>>>
>>> Could the qAlign function perform an alignment of my short reads with two
>>> reporter gene sequences, excluding the genome.
>>> It seems, from my first attempt, that the 'genome' parameter is essential;
>>> is this true?
>>> Anyway how do I have to feed those sequences to qAlign and how do I have to
>>> format my auxiliary file/files to see in the alignment statistics the number
>>> of mapped reads for each reporter gene sequence separately?
>>>
>>> Thank you in advance for your help.
>>>
>>> Ugo
>>>
>>> P.S. Let me say that QuasR is a fantastic package!
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
> 
> 

-- 
--------------------------------------------
Michael Stadler, PhD
Head of Computational Biology
Friedrich Miescher Institute
Basel (Switzerland)
Phone : +41 61 697 6492
Fax   : +41 61 697 3976
Mail  : michael.stadler at fmi.ch



More information about the Bioconductor mailing list