[BioC] EasyRNASeq issue (missing bai files)

Nicolas Delhomme delhomme at embl.de
Fri Jul 20 17:15:22 CEST 2012


On Jul 20, 2012, at 3:57 PM, Jorge Beira wrote:

> 
> Hi Nico,
> It looks like I'm making some progress slowly... so it's good.
> 
> On Fri Jul 20 14:43:49 2012, Nicolas Delhomme wrote:
> 
>> 
>> I'm really surprised you could install version 1.3.8 actually. It requires edgeR >= 2.7.23 (yours is 2.6.9), and IRanges >= 1.15.18 (yours is 1.14.4). How did you do the installation?
>> 
> 
> I installed it with the command " source("http://bioconductor.org/biocLite.R") biocLite("easyRNASeq")" and then it asked me if I wanted to update the dependencies to which I said yes... as long as it works I'm happy ;)

This is extremely strange. Since your version of BiocInstaller is 1.4.7 it should never have installed easyRNASeq 1.3.8. Can you open a fresh R session, load easyRNASeq and report the sessionInfo?

> 
> 
>> Anyway, if you want to use the development version of the easyRNASeq package, you should use the development version of every package (knowing that they are then more unstable and might be broken for a couple of days from time to time). If you want to use the development version, you should do the following:
>> 
>> library(BiocInstaller)
>> useDevel(TRUE)
>> chooseBioCmirror() ## pick the closest one
>> update.packages(ask=FALSE)
>> 
>> If not, you're better off reverting to easyRNASeq version 1.2.3.
>> 
>>> 
>>> -------
>>> 
>>> getting back to the issues:
>>> 
>>> indeed when I query file.exists(system.file("dmel-all-r5.45.gff")) is turns FALSE -- but now this is very strange, then, since I know the file is in the bams directory together with the bam files so why is it not reading it?.... In any case looks like the major problem is here before I can proceed. Any idea on this one?...
>>> 
>> 
>> system.file is not appropriate here, look at the help about it for more details (it basically look up files in your R installation directories), i.e. in R type:
>> 
>> ?system.file
>> 
>> Just using annotationFile="dmel-all-r5.45.gff" should do it.
>> 
>>> I aligned my reads to the "dmel-all-chromosome-r5.45.fasta" version, however now I'm using the gff file "dmel-all-r5.45.gff" -- does this suggest any issue to you?
>> 
>> No, that should be fine, but we can double check. What does the following gives you (in a terminal, not in R)
>> 
>> grep ">" dmel-all-chromosome-r5.45.fasta
>> 
> 
> 
> This command gives me what I paste below, so from the looks of it there's all the info...:

Good, that looks as expected.

> 
> (What if I align it to the transcriptome, could I still use easyRNASeq or would there be problems to decide what sequences correspond to each feature...? It's just something I've been wondering about)

I would not do that for several reasons. 

1) Yes you could use easyRNASeq, but you would have to define your own gff/gtf file, because the one you use uses genomic coordinates and if you align to the transcriptome, you would get transcriptomic coordinates.

2) Aligning against the transcriptome is fine if that is the only reference you have. Drosophila has a well known genome so it's much better to align against that. Indeed, it might be that a read aligns perfectly to the genome (i.e. in an enhancer (see the eRNAs paper from Kim et al, Nature, 2010)) and only incorrectly to the transcriptome. By aligning to the transcriptome only, such reads would be incorrectly assigned and would bias your results. If you're worried about reads spanning EEJ and/or multiple mapping reads, you should rather use an aligner that can handle that (tophat, RSEM,..) rather than using the transcriptome as a reference for your alignment. 

Cheers,

Nico

> 
> thanks!
> J
> 
> grep ">" dmel-all-chromosome-r5.45.fasta
> 
>> YHet type=chromosome_arm; loc=YHet:1..347038; ID=YHet; dbxref=REFSEQ:NW_001848860,GB:CM000461; MD5=478fbc07ea1b1c03b3d8d04abacf51a5; length=347038; release=r5.45; species=Dmel;
>> dmel_mitochondrion_genome type=chromosome; loc=dmel_mitochondrion_genome:1..19517; ID=dmel_mitochondrion_genome; dbxref=GB:NC_001709; MD5=61af8db53361cd5744f41f773d21c3d4; length=19517; release=r5.45; species=Dmel;
>> 2L type=chromosome_arm; loc=2L:1..23011544; ID=2L; dbxref=REFSEQ:NT_033779,GB:AE014134; MD5=bfdfb99d39fa5174dae1e2ecd8a231cd; length=23011544; release=r5.45; species=Dmel;
>> X type=chromosome_arm; loc=X:1..22422827; ID=X; dbxref=REFSEQ:NC_004354,GB:AE014298; MD5=a83251de102926ba52e3e71c72f1d9b0; length=22422827; release=r5.45; species=Dmel;
>> 3L type=chromosome_arm; loc=3L:1..24543557; ID=3L; dbxref=REFSEQ:NT_037436,GB:AE014296; MD5=ec7148cae3daabbd2a226eaa6e85d7c2; length=24543557; release=r5.45; species=Dmel;
>> 4 type=chromosome_arm; loc=4:1..1351857; ID=4; dbxref=REFSEQ:NC_004353,GB:AE014135; MD5=515edfcc517bc4e39ae5c696cfd44021; length=1351857; release=r5.45; species=Dmel;
>> 2R type=chromosome_arm; loc=2R:1..21146708; ID=2R; dbxref=REFSEQ:NT_033778,GB:AE013599; MD5=1589a9447d4dc94c048aa48ea5b8099d; length=21146708; release=r5.45; species=Dmel;
>> 3R type=chromosome_arm; loc=3R:1..27905053; ID=3R; dbxref=REFSEQ:NT_033777,GB:AE014297; MD5=6b279651b3b268f11e0dd1d87ded0ccc; length=27905053; release=r5.45; species=Dmel;
>> Uextra type=chromosome_arm; loc=Uextra:1..29004656; ID=Uextra;  MD5=3d47647f1cde286a947d03cc17f0bad3; length=29004656; release=r5.45; species=Dmel;
>> 2RHet type=chromosome_arm; loc=2RHet:1..3288761; ID=2RHet; dbxref=REFSEQ:NW_001848856,GB:CM000457; MD5=88c0ac39ebe4d9ef5a8f58cd746c9810; length=3288761; release=r5.45; species=Dmel;
>> 2LHet type=chromosome_arm; loc=2LHet:1..368872; ID=2LHet; dbxref=REFSEQ:NW_001848855,GB:CM000456; MD5=5217e6a4295a824c31e43d5a9da9038b; length=368872; release=r5.45; species=Dmel;
>> 3LHet type=chromosome_arm; loc=3LHet:1..2555491; ID=3LHet; dbxref=REFSEQ:NW_001848857,GB:CM000458; MD5=4902d32327726bf385aa55a75399bdfc; length=2555491; release=r5.45; species=Dmel;
>> 3RHet type=chromosome_arm; loc=3RHet:1..2517507; ID=3RHet; dbxref=REFSEQ:NW_001848858,GB:CM000459; MD5=752e488dbea78e592aca03745eb732ea; length=2517507; release=r5.45; species=Dmel;
>> U type=chromosome_arm; loc=U:1..10049037; ID=U;  MD5=4b72bf19979c8466d5b66acca66f1804; length=10049037; release=r5.45; species=Dmel;
>> XHet type=chromosome_arm; loc=XHet:1..204112; ID=XHet; dbxref=REFSEQ:NW_001848859,GB:CM000460; MD5=dcd15c551e34903f6171f53f553b55ee; length=204112; release=r5.45; species=Dmel;
> 
> 
> 



More information about the Bioconductor mailing list