[BioC] Rsubread : featureCounts PEReadsReordering=TRUE deletes unpaired reads?

Fri Nov 15 06:29:21 CET 2013

Hi Ryan,

We have made changes to featureCounts to let it deal with those read pairs that having missing mates. featureCounts now also checks if reads from the same pair are adjacent to each other and if not it will automatically re-order the reads to make them ready for read counting. So you do not need to specify the PEReadsReordering option any more (it is now removed from the function) and you do not need to use the separate bam file which includes unmapped reads either.

Changes have been committed to bioc devel and they should be available in a couple of days (Rsubread v1.13.6)

Wei

On Nov 11, 2013, at 12:23 PM, Ryan wrote:

> Ah, I see now. I am using tophat, which outputs the unmapped reads to a separate bam file. I guess I should merge the two bam files and then featureCounts should work as I expect?
> 
> On Sun Nov 10 16:59:32 2013, Wei Shi wrote:
>> Dear Ryan,
>> 
>> featureCounts requires that both reads from the same pair are present in the BAM/SAM file. But this doesn't seem to be the case for your data. It looks like unmapped reads were not included in your BAM files and therefore you got quite a few unpaired reads in your data. When counting in paired-end mode (isPairedEnd=TRUE) and read reordering being required (PEReadsReordering=TREU), featureCounts tries to reorder reads from the same pair to make them be adjacent to each other before performing read counting. If unpaired reads are encountered, featureCounts simply discards them.
>> 
>> Simply sorting your bam files by name is not going to solve this problem because many reads are unpaired in your data. You may use an aligner that outputs all the reads in it mapping output (such as the align function in Rsubread) and then runs featureCounts on it.
>> 
>> In the meantime, we are now trying to improve featureCounts to deal with this simulation. Will let you know when this is done.
>> 
>> Also note that this problem only occurs for paired-end data, but not for single-end read data.
>> 
>> Best regards,
>> Wei
>> 
>> On Nov 9, 2013, at 10:54 AM, Ryan C. Thompson wrote:
>> 
>>> Hello,
>>> 
>>> I am using the featureCounts function in Rsubread to count RNA-seq reads. Since my BAM files are coordinate-sorted paired-end, I am using "isPairedEnd=TRUE" and "PEReadsReordering=TRUE". When I run my script, I see the following in the output:
>>> 
>>> Process /gpfs/home/rcthomps/Projects/cyno/tophat_results/R77_CN7314_P4 ...
>>>   Resort the input file ...
>>>   2718714 unpaired reads were ignored in reordering.
>>>   Total number of fragments is : 1653582
>>>   Number of successfully assigned fragments is : 1529491 (92.5%)
>>>   Running time : 3.39 minutes
>>> 
>>> In particular, I am worried about the line "2718714 unpaired reads were ignored in reordering." The documentation for PEReadsReordering doesn't say anything about discarding unpaired reads, so I'm wondering whether this is intended behavior, and whether there is a way around this? If I sort my bam files by name so that pairs are adjacent in the file, will subread still discard the unpaired ones, or will it count all of them?
>>> 
>>> -Ryan Thompson
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the addressee.
>> You must not disclose, forward, print or use it without the permission of the sender.
>> ______________________________________________________________________

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}