[BioC] QuasR special case alignment

Ugo Borello ugo.borello at inserm.fr
Wed May 8 10:33:31 CEST 2013


Dear Michael,
Regarding the auxiliary alignments, if I understand correctly,
having two sequences and because I want to count the alignments separately
for each of them, I have to format the  auxiliaries.txt file this way:
FileName    AuxName
seq1.fa          egfp
seq2.fa          cre
So in this way I will have to copy the two sequences in two different files
( seq1.fa  and seq2.fa), correct?

Then, I have two more brief questions:
1) is there a special method to save the qProject object?
2) and regarding the matrix obtained using qCount, I wanted to add gene ids.
So I ran:
> query<- exons(TxDb.Mmusculus.UCSC.mm10.knownGene, columns='gene_id')
> geneLevels <- qCount(proj1, query, reportLevel = "gene", clObj=cl)
No gene ids as expected.
If I run this line before calling qCount
names(query)<-mcols(query)$gene_id
I get this error message:
Error in as.vector(x, mode = "character") : coercing an Atomic List object
to an atomic vector is supported only for objects with top-level elements of
length <= 1 
And when I change it to
names(query)<-as.vector(mcols(query)$gene_id)
> geneLevels <- qCount(proj2, query, reportLevel = "gene", clObj=cl)
Error in reduce(split(query, names(query))[unique(names(query))]) :
  error in evaluating the argument 'x' in selecting a method for function
'reduce': Error in .nextMethod(x = x, i = i) : subscript contains NAs
 
qCount complains.

What is then the correct method to add gene ids to the qCount matrix?
And what about adding gene names also?



Thank you in advance for your help.

Ugo




> From: Michael Stadler <michael.stadler at fmi.ch>
> Date: Thu, 2 May 2013 08:38:09 +0200
> To: Ugo Borello <ugo.borello at inserm.fr>
> Cc: <bioconductor at r-project.org>
> Subject: Re: [BioC] QuasR special case alignment
> 
> Hi Ugo,
> 
> I can see one problem in the auxiliaries.txt file: You are referring the
> same sequence file (seq.fa) twice, so all auxiliary alignments are done
> twice.
> 
> The speed of QuasR is primarly defined by the speed of bowtie/SpliceMap.
> The fact that the alignments against chr1 are fast may be due to only
> few reads producing hits on chr1, or at last much fewer hits than
> against the auxiliaries.
> 
> A way to speed up the analysis would be to do unspliced alignments
> (spliceAlignment=FALSE), which may not be necessary if your reporter
> genes in seq.fa are spliced cDNAs. Also, it would help to use more than
> just 4 CPU cores, if you have additional hardware available to you.
> 
> Michael
> 
> On 26.04.2013 14:58, Ugo Borello wrote:
>> Thank you Michael,
>> The reporter genes are not part of the genome, so I am aligning my short
>> reads to the genome (actually chr1only in the following test) and to a fasta
>> file (seq.fa) containing the sequences of the two reporter genes.
>> But, while alignment to chr1 was relatively fast, the alignment to the fasta
>> file it is taking forever.
>> Here the code:
>> 
>> library(QuasR)
>> library(parallel)
>> 
>> cl <- makeCluster(4)
>> sampleFile <- "test.txt"
>> genomeName <- "chr1.fa"
>> auxFile <- "auxiliaries.txt"
>> 
>> proj1 <- qAlign(sampleFile, genome= genomeName, auxiliaryFile= auxFile,
>> splicedAlignment=TRUE, clObj=cl)
>> 
>> 
>> text.txt is formatted this way:
>> FileName        SampleName
>> 31omo.fastq    Sample1
>> 
>> 
>> auxiliaries.txt is formatted this way:
>> FileName    AuxName
>> seq.fa          egfp
>> seq.fa          cre
>> 
>> 
>> Am I missing something here? Is there a faster way to complete my analysis?
>> 
>> Thank you in advance for your help
>> 
>> Ugo
>> 
>> 
>>> From: Michael Stadler <michael.stadler at fmi.ch>
>>> Date: Fri, 26 Apr 2013 08:40:50 +0200
>>> To: <bioconductor at r-project.org>
>>> Cc: <ugo.borello at inserm.fr>
>>> Subject: Re: [BioC] QuasR special case alignment
>>> 
>>> Dear Ugo,
>>> 
>>> One way to accomplish would be to put your two reporter genes into a
>>> fasta file, and use this as your "genome" (see ?qAlign or the vignette
>>> for examples of genomes in fasta format).
>>> 
>>> However, I think it would be better to align against the full genome, as
>>> this will avoid certain problems, such as reads being suboptimally
>>> aligned to your reporter genes which would align perfectly to some other
>>> genomic locus.
>>> 
>>> Assuming that the reporter genes are part of the genome, you could then
>>> count alignments using qCount() with a GRanges query containing the
>>> ranges of your reporter genes.
>>> 
>>> If the reporter genes are not part of the genome, you can provide your
>>> reporter gene sequences (as a fasta file) to qAlign, and all reads not
>>> aligning to the genome will be further aligned to them. The format of
>>> the required auxiliary file is very simple and described in section 5.2
>>> of the vignette.
>>> 
>>> You can open the QuasR vignette using:
>>> library(QuasR)
>>> vignette("QuasR-Overview")
>>> 
>>> Best wishes,
>>> Michael
>>> 
>>> On 25.04.2013 16:52, Ugo Borello wrote:
>>>> Good morning,
>>>> 
>>>> I am doing RNA-Seq analysis and I would like to perform an exploratory
>>>> analysis to first verify that 2 reporter genes are expressed in my samples.
>>>> 
>>>> Could the qAlign function perform an alignment of my short reads with two
>>>> reporter gene sequences, excluding the genome.
>>>> It seems, from my first attempt, that the 'genome' parameter is essential;
>>>> is this true?
>>>> Anyway how do I have to feed those sequences to qAlign and how do I have to
>>>> format my auxiliary file/files to see in the alignment statistics the
>>>> number
>>>> of mapped reads for each reporter gene sequence separately?
>>>> 
>>>> Thank you in advance for your help.
>>>> 
>>>> Ugo
>>>> 
>>>> P.S. Let me say that QuasR is a fantastic package!
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>> 
>> 
> 
> -- 
> --------------------------------------------
> Michael Stadler, PhD
> Head of Computational Biology
> Friedrich Miescher Institute
> Basel (Switzerland)
> Phone : +41 61 697 6492
> Fax   : +41 61 697 3976
> Mail  : michael.stadler at fmi.ch
> --------------------------------------------
>



More information about the Bioconductor mailing list