[BioC] easyRNASeq error

Nicolas Delhomme delhomme at embl.de
Fri Mar 2 12:20:36 CET 2012


Hi Wade,

I'll send you the package off-list. And discuss how we can do for the data as it's still interesting for me to enhance my chromosome name validity checks.

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------



On 1 Mar 2012, at 23:14, Davis, Wade wrote:

> Hi Nico,
> If you could send me a windows zip that I could try out the changes now instead of waiting for them to get pushed through in a few days. But I don't mind waiting, whatever is easiest for you.
> 
> I've been in a workshop all days, so I haven't gotten a chance to try out the suggestions for the second problem, but I will do so and get back to do.
> 
> Also, you asked for a sample file to help with the second problem. I'm not sure if you still want or need that, but below is the first 60 lines or so. I can send you an entire file (compressed which is about 455MB) if you think that would be helpful.
> 
> Thanks!
> 
> Wade
> 
> HWI-ST538	160	7	1101	1635	1954	ACAGTG	1	GATATCTNNNNNNGAAGAAAACGAGGGAGATTCCAAGGCCGAGCAGCTACACCNGGTACAGAAGCAGTCTGCCACTTTGCCCTCGCCCATCACCCACTCC	
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	1661	1972	ACAGTG	1	GGATAGCATNNNAAATGTAAATGAGGAAAATACCTAATAAAAAATATTAAAAAAAAAAAAGACACCGCCAGGAACTTGTGAGAAATCCAAATTCTACATT	
> [[[______BBBQQX_^^^^_^___^]^_^^^^[^^^^^^]^^^^^^^^^^^^^^^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	1949	1944	ACAGTG	1	AGGGGTTNNNNNNTTCAAGACCTCTCCCCTGNNGAGAAAGAAACCCAAGCAGNNGCCTGTGGGAGGGAACACGGACCTGACACCTCTGCCTCAGACTCCT	
> [[XZY_^BBBBBBQQ[\Z\]_[[Z^^]^^[^BBOL[\X^^PY]ZX\^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	1820	1948	ACAGTG	1	GTCAGCTNNNNNNAGGGTCCGATCGTATTGGANGAGGATATCTTTGATCTCCNNCTTGGAGGCTTCATCCACATATTTTTGGTAATAAGCTACCAGGGCC	
> [[[^___BBBBBBRS]_]^__^__^^^^^^^^BO[\^^^^^^^^^\^^^^^]BBLW\\^]_^X^^]\]]]]^^]]]]]^\[VZZ\\\\Y[\\\[\[[Y[[	QC											N
> HWI-ST538	160	7	1101	1773	1957	ACAGTG	1	CTTTGGTTNNNNNGACCTCGCTTGTAGCCAGCAAAAATGGCCTTGCACCACAGCCTTCCAGACATACTTGCTGTTAGAAGTCCTGAGATCGGAAGAGCAC	
> WZZ_[\]_BBBBBRQY_^^\Z^__^^^^^^]^]^]^^^^^^]^^^]]X]]]^^^^^^^___[^^]]^^^]^^W\\]]]\]V\\\][[[\\[[Z[Y[[WZW	QC											N
> HWI-ST538	160	7	1101	1971	1962	ACAGTG	1	CTTTGTTTNNNNTCTGGTCACAATAGTTGGGGGCAGAAAAGACAGTGACACAGCGGCCACCATGGGCCACCTCGTAGCCCTCGGCTTTGACTTCATGGCT	
> [[[_____BBBBRR_____^__^________^^^^^^^^^^^^^^^^^^^^^^[^^\\[[^[[[\[[[[[[[[[W[[[\\[[WY[[U[YT[[\[YZSYW[	QC											N
> HWI-ST538	160	7	1101	1878	1976	ACAGTG	1	CCGCTGTCTTNCTCAGGAAGCACGCATTTCCACAGGCCATAGGGCTTTCCTACCACAGTTTGCCACAATCTCAATGTCCCATCTTCAGAACCGCTGGCAT	
> ^W^\cc\`^`BQQ``JY[YRb`^_dUccaSccaIYHOWOXRacQaaQabd\bbcbbcc\^bccc]Z^c_]R_````UZZ]`Y\ZT]]`BBBBBBBBBBBB	RM											N
> HWI-ST538	160	7	1101	1934	1999	ACAGTG	1	CTCCACTGAAGTACTTCAGGTAGTAGTCCATGACGCCCTTGTCGTCGGTTCGCGACTCCTCATCACTGTCCCCGGCCCGGCCCACCGATGACATCATCTG	
> bbbeeeeegggeghiiiiiigghgfiiiiihihiihiihiihhfghdgZeefhadeecdbdb`bccbb`bb`aQX[acX[[acO[aOEHOGYRY`]Y`R]	chr10.fa		80607676	R	100	500						Y
> HWI-ST538	160	7	1101	2013	1948	ACAGTG	1	CACATCTNNNNNNCCATGGAGCAGATTGAAGANGAAATCAAAGGTTGTTTGGNNTTTCTTCGCACAGTATATAGTGTCTTTGGATTTTCATTTAAATTGA	
> [[[___^BBBBBBRSX_____^_^_^_^__^^BP[^^^^^^^^^\^^^^^^[BBLW\^^__]^^^^^Z\]]]]^\Z\]]^^][[\\\\\\\\\\\\[\Y[	QC											N
> HWI-ST538	160	7	1101	2064	1957	ACAGTG	1	AGAAAGTTNNNNNATAGAAGATGTATAAGTTGATCGTAACGGAAGCGTGGATAAGATGCTCGGATCCATAGGAATGTTGATGATAGTAGTAGAGCTTCTA	
> [[[[^[\_BBBBBQQ\[\^]__^[_]_^^^_^^__^[^^[[W^^[^^\^^\^^^]^^^___[^[XW]Z]^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	2129	1991	ACAGTG	1	GTAATATTCATTTAGCCTTCTGAGCTTTCTGGGCAGACTTGGTGACTTTGCCAGCTCCAGCAGCCTTCTTGTCCACAGCTTTGATGACACCCACAGCAAC	
> ___ce`cce`ggfdhfhfgbfbefdghggfd`ff]eZegg]eIYYYceg^cedfce^ebgfce_ccfhdfdZbdbV`db]Za^ZaZ]__bbaa_PTGT^`	RM											Y
> HWI-ST538	160	7	1101	2267	1959	ACAGTG	1	CAGCACAGNNNNGATGGCTACGTACATGGCTGGGGTGTTGAAGGTCTCAAACATGATCTGGGTCATCTTTTCACGGTTGGCCTTAGGGTTCAGGGGGGCC	
> [[[__^^^BBBBQQ_\^[^]^^^^^[^_^[^^^^^\Z\^RU\]]UY]WXZ^]Z^]Z]^_^^]ZZV^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	2466	1999	ACAGTG	1	CGTACTTCTCCCACTCGCTTGTGCTCTGTCACTAGAAGGTTATTCTTGAAAGGTTCTGTGCCTTTTTCTTCAGGCTTCATCTGTCATACCTCTACTTGAT	
> bbaeeeeegggggiiifiiiihiiiiihhhiiihighiiafgfhiiifaehghXcghhhhbghiiiifhhfdghhgfedgeddcbeeccbddd`c_b`cc	chr7.fa		54159980	R	100	504						Y
> HWI-ST538	160	7	1101	2566	1981	ACAGTG	1	ATTGAACTCANCTCCTTTTAGGTTGCCCTACTAACTGGACATTGTGGACACTAAATCATATTGGGGTGATGAATAATCAAATTTTCAATATCAAACAATG	
> abaceeeeggBQ`eghhiifhhiiigihifhihfhhihhiifhffdgffhifhihfhhhiihdgfgZd^bd^ade^abddc]`bc__bcdc_c_bb`abb	chr10.fa		66474748	R	10G89	497						Y
> HWI-ST538	160	7	1101	2943	1957	ACAGTG	1	TCCGAATANNNNNCGGTCAAAATAAAAGCTCAAGATGACATCAGTCCCATTTGTCCTAAGTCCTGGTGTTGTATGGATGGTAAGCAGCAGCCAATTATGG	
> [[[^^^^_BBBBBQQ_\_^___^_^_^^^^^^^^^^^^^^^^^^\^^^^^^^^^^^^^__\]^]^_]^^^^^_^_^^^^^W\]Z]]][[^[[[[W[\\\[	QC											N
> HWI-ST538	160	7	1101	2883	1967	ACAGTG	1	ATCAGACTCNNNCTGCACGTTCACACCATAAGTCTCATCAATGTTGTCATCCATATTCTGTATCTCCTTGTCGCCACCGTAGTCTGTGATCTTCTTGCCC	
> [[[__^^^^BBBR[T]]^^][]]^^[X][]]W^]^^^^X[Y\[YRY^[^^^^[]^^^^]^\Y]\Z^^[[ZU\\^BBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	3133	1960	ACAGTG	1	CTCTCAAANNNNCTACAGAGCCCAGGAAGATTTCAGGATGAAGAGCTTCCTCCTCTTCCTCACTATCATTCTTCTGGTTGTGATTCAGATACAAACAGGA	
> [[[^Z]_]BBBBSQ\_Z^_^_^^_^^^]_^^^^\^^^[Z]^^^^^^]^^^^^^]^]^^_]^ZZY^^_\]^\]^[]Z\^^Z\\]\\\\\\\]]]]\[[VQV	QC											N
> HWI-ST538	160	7	1101	3023	1964	ACAGTG	1	GGCCACTCTNNNTTGTGCCTTCCACACCTTTTGGTGCAGATTGTTCAGTGGGGAGCTGCCTCATATTCTCTGACTGGCCAGAAAGGCTCTCGCTGGGGTT	
> [[[__^^^^BBBRQ_\_^^___^^^^_^^__^]^^^^^^^^^^^]\^^]^^^^^^^^^T]^^ZZ^\^]^^^]\\]]^BBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	3219	1966	ACAGTG	1	CAGATGTGTNNNAATGCTAGGTGTGGTTGGTTTATTCCTAGCGTCACTATTATCAGGCCTAGTTGGCTTGATGTAGAGAAGGCAATGATTTTTTTGATGT	
> [[[^_____BBBS[]___^__[_]_^\__^Z]^^^^^^^^^^^^^^^^^^^^^^]^^^^_[^^^^___^[^^]]]\\]\]]^^[[\\\\\\\\YXTW[\\	QC											N
> HWI-ST538	160	7	1101	3047	1993	ACAGTG	1	CTCTTGTGTCCGTTTTCTTGCTGTAGCGTTTTCTCCGTACCACTCACTGATACCTGTCTATCGACTACGTGGGATGGCAGAGGCACTGCCAGGCTCCACA	
> bbbeeedegggfghiiiihhiihifhiieggihffh[affgcgfebb]eefdfh[bgbcdfe_fW^\]aaZa`abZ^``a_GWWW^```^bS^`W^^BBB	chr1.fa		154218045	F	100	36						Y
> HWI-ST538	160	7	1101	3462	1958	ACAGTG	1	CCACTCGTNNNNCTTGCGATCATTGTGCTTAGGGTCTTCTGATACATCTGGGGGAACATCTGTTTTATCTGAACAAGGTCAATCTCACTTCGAGTGACCA	
> [[[^^^_\BBBBQQ[ZZ^_[Y_[_]_\W^^^^[^M[]]]]\]]^^^^]\^^^FZF\\]Z^^^X\^^^]]^]\Z\]ZOZXZZ[\\\\TZU\[ZTTW[\[\\	QC											N
> HWI-ST538	160	7	1101	3261	1972	ACAGTG	1	CCTGGATGTNNNCGTGGATCGGGTCTTGAACGTGGACTTGCCGCACCTTGTATCTCAAGCCCAGCGCCACACCCTTGCTGACTTTCTGAACACTACTGTC	
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	3390	1979	ACAGTG	1	TGCAGTTCTTNCCCGTGGCTCCTGAGCCAGCAATGTCGTCTGGGAAGCCATCATCAGCTCATTCAGCTGAGACGTGCTAGAGGTCTCCTCAGCTTGGAGC	
> bbbeeeeeggBRbfgfhihiiihhhhiiiiiehhichhgggfghbgedgdffhffdefff`bd]bdfedceecdZZ`_`bb_aL[bcc`b]`b]`b_`[`	chr1.fa		187111962	F	10T89	497						Y
> HWI-ST538	160	7	1101	3666	1959	ACAGTG	1	CCGAACACNNNNCCCGGATCTCATCTCACAGAATCGTGGGTACGTCTGAACCAGCTGGTTGCCCCCAAAAATTACCTGTTTCACGCCCCTGGTGCAGCCC	
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> HWI-ST538	160	7	1101	3667	1975	ACAGTG	1	AGAACACCGANGTCCACCTTCAAGTTAGGGGAAATGGCATATTCTTCGATCGGCTTGCTGATTTGGGAAATTTTATCCTTGACTGCTTCTGGAGCAGGAC	
> PZZTTP_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	chr4.fa		99634649	R	10G3T10C4C19T1T13A10AG3T17	77						N
> HWI-ST538	160	7	1101	3986	1969	ACAGTG	1	ATTAAATCTNNNCATGCTCTGAAAAAGGCTTTAGGTCACTCCAAGCTTGGCAGTTAACATTTGGCATGGACACTGGTAAAACCACAATAGAGAAAGAAGT	
> [[[^_____BBBRY__^^^__^^^^^_^^__^^^^^^^^^]^^^^^^^]^^^^\^^^^____]^____[^^[^^^^Z^Z\\\\^[^^^]]\[\\\\[[[R	QC											N
> HWI-ST538	160	7	1101	4230	1977	ACAGTG	1	CGCCTCCGGGNCGTGGCACTCTGGGGCTCTGCCGTCGACATGGGCGCCGCCGCGTGGGCACCGCCACACCTGCTGCTGCGGGCGGCTTTTCTTCTTCTGC	
> bbbeeeeegfBQ`bfha_fdeedfhfd`ffdcfc^_aZ__``fg]`[a[[aQQTHTXaQX[^TOWW]aaaXX^BBBBBBBBBBBBBBBBBBBBBBBBBBB	chr2.fa		153067284	F	10C73T4C2G7	372						Y
> HWI-ST538	160	7	1101	4363	1955	ACAGTG	1	CGAATGGANNNNNGTGTATACATCTGTGACTGAGACTGAGGCTGCACATGCAAAAAATCTTCTGAAAACAGACAGGCGGGTGCAGCTTGATGGGTCAACT	
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB	QC											N
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Nicolas Delhomme [mailto:delhomme at embl.de] 
> Sent: Thursday, March 01, 2012 2:24 PM
> To: Nicolas Delhomme
> Cc: Davis, Wade; Bioconductor mailing list
> Subject: Re: [BioC] easyRNASeq error
> 
> Dear Wade,
> 
> I've committed two changes to easyRNASeq that should solve your first question. The new package 1.1.8 should be available after the next build in bioconductor, in a day or two. You can get the changes from svn directly and install from source if you prefer, see http://bioconductor.org/developers/source-control/.  If you're not familiar with that, I could send you a windows zip archive of the package, just let me know.
> 
> Summarizing by geneModels should work on windows now. I'd be glad if you could run these two tests for me and let me know if they succeed. 
> 
> library("easyRNASeq")
> library("RnaSeqTutorial")
> library(BSgenome.Dmelanogaster.UCSC.dm3)
> 
> rnaSeq <- easyRNASeq(system.file(
>                                  "extdata",
>                                  package="RnaSeqTutorial"),
>                      organism="Dmelanogaster",
>                      chr.sizes=as.list(seqlengths(Dmelanogaster)),
>                      readLength=36L,
>                      annotationMethod="rda",
>                      annotationFile=system.file(
>                        "data",
>                        "gAnnot.rda",
>                        package="RnaSeqTutorial"),
>                      format="bam",
>                      count="genes",
>                      summarization="geneModels",
>                      pattern="[A,C,T,G]{6}\\.bam$",
>                      outputFormat="RNAseq")
> 
> rnaSeqPar <- easyRNASeq(system.file(
>                                  "extdata",
>                                  package="RnaSeqTutorial"),
>                      organism="Dmelanogaster",
>                      chr.sizes=as.list(seqlengths(Dmelanogaster)),
>                      readLength=36L,
>                      annotationMethod="rda",
>                      annotationFile=system.file(
>                        "data",
>                        "gAnnot.rda",
>                        package="RnaSeqTutorial"),
>                      format="bam",
>                      count="genes",
>                      summarization="geneModels",
>                      pattern="[A,C,T,G]{6}\\.bam$",
>                      outputFormat="RNAseq",
> 			nbCore=2)
> 
> Cheers,
> 
> Nico
> 
> ---------------------------------------------------------------
> Nicolas Delhomme
> 
> Genome Biology Computational Support
> 
> European Molecular Biology Laboratory
> 
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
> 
> 
> 
> 
> 
> On 1 Mar 2012, at 17:28, Nicolas Delhomme wrote:
> 
>> Dear Wade,
>> 
>> Thanks for the nice words, that's good to hear! Although I have to say 
>> it got me a little worried, because the more people praise, the more 
>> difficult the questions to come ;-)
>> 
>> I added the bioconductor mailing list in Cc, so that it benefits others.
>> 
>> The first error, I was surprised to see. From the parallel package documentation, I did not expect it, but it got well extended since I started using it and is now obvious that the MakeForkCluster would not work on windows. As I don't have access to any Windows machine, I would like to give you the code that I'll write to replace that per email (off list) so that you can try it and tell me if it fixes the issue. Would that be OK?
>> 
>> 
>> For the second error, you are correct. Note that it is NOT necessary to use the BSgenome for easyRNASeq, that's just that as they are available in Bioconductor it's easy to use them for exemplifying, but they are not mandatory. I need to state that more explicitly in the vignette. 
>> Back to your issue, I'm expecting in the alignment file and in the 
>> chr.sizes object chromosome names that follow UCSC conventions, e.g. 
>> chr1 to chr19 and chrX, chrY and chrM for the M.musculus for example. 
>> If it's not what I get, I try to "guess" them but obviously, since 
>> every sequencing facility has its own convention, it is not a 
>> straightforward business. For that reason, I've implemented a pass-by. 
>> To use it, we need to do the following
>> 
>> 1) create a data.frame that contains the mapping from your different input. As you're using biomaRt as an annotationMethod, we will have three sets of chromosome names:
>> 
>> a) biomaRT: 1:19,"X","Y"
>> b) your alignment file: chr1.fa to chr19.fa, chrX.fa, chrY.fa, etc...
>> c) the chromosome file: chr1 to chr19, chrX, chrY, etc...
>> 
>> that we need to combine together. Since c) follows the UCSC 
>> convention, we'll use that as our standard. I'm not sure what you want 
>> to do about the NM, QC, RM and splice_sites-auto.fa, so I ignore them. 
>> From the name, you probably don't want to ignore that last one, but 
>> I'm not sure how you can use it. Here we go
>> 
>> chr.map <- 
>> data.frame(from=c(1:19,"X","Y",paste("chr",c(1:19,"X","Y"),".fa",sep="
>> ")),to=rep(paste("chr",c(1:19,"X","Y"),sep=""),2))
>> 
>> 2) rerun you command with the chr.map argument and a "custom" organism 
>> argument
>> rnaSeq2 <- easyRNASeq(filesDirectory=extdataDir,
>>                     organism="custom",
>>                     chr.sizes=chr.sizes,
>>                     filter=myfilt,
>>                     readLength=50L,
>>                     annotationMethod="biomaRt",
>>                      format="aln",
>>                     count="exons",
>>                     pattern="^1-Feb_ATCACG_L003_R1_001_export.txt.gz$",
>>                     outputFormat="RNAseq",
>> 			chr.map=chr.map
>>                     )
>> 
>> Note that I haven't tried that myself, I always had a least two set of chromosomes in agreement, so it might fail. If so, it would be great if I could get access to an excerpt of your data to try it out.
>> 
>> Finally, note that there's another way to by-pass this issue, if you are not  using biomaRt but a local annotation source (gff, gtf, rda, etc...). If this file would contain the same chromosome names as your alignment file, you could do the following: 
>> 
>> 1) change the chromosome name in the chr.sizes to add  a ".fa" at the 
>> end chr.sizes <- as.list(seqlengths(Mmusculus))
>> names(chr.sizes) <- paste(names(chr.sizes),"fa",sep=".")
>> 
>> 2) rerun your command with that new chr.sizes and precise 
>> validity.check=FALSE in the easyRNASeq command
>> 
>> rnaSeq2 <- easyRNASeq(filesDirectory=extdataDir,
>>                     organism="Mmusculus",
>>                     chr.sizes=chr.sizes,
>>                     filter=myfilt,
>>                     readLength=50L,
>>                     annotationMethod="biomaRt",
>>                      format="aln",
>>                     count="exons",
>>                     pattern="^1-Feb_ATCACG_L003_R1_001_export.txt.gz$",
>>                     outputFormat="RNAseq",
>> 			validity.check=FALSE
>>                     )
>> 
>> You need to be careful about what you're doing there, since you shut off all the consistency check on the chromosome naming.
>> 
>> 
>> Cheers,
>> 
>> Nico
>> 
>> 
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>> 
>> Genome Biology Computational Support
>> 
>> European Molecular Biology Laboratory
>> 
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>> 
>> 
>> 
>> 
>> 
>> On 29 Feb 2012, at 22:07, Davis, Wade wrote:
>> 
>>> Dear Nicolas,
>>> First , I want to thank you for the hard work you have put in making the easyRNASeq package. I am very familiar with R and BioConductor, and NGS statistical models. But I have been learning over the past month how to deal with raw NGS for the first time, and the number of different ways and packages is a bit overwhelming. Fortunately, I discovered your package, and it has simplified things greatly and helped me understand the workflow better to getting to a count table. So once again, thank you! I am sure you will have many users in the coming months.
>>> 
>>> I have 2 questions. The first relates to the error below which seems to come from the summarization method I am using, which must involve trying to do some work in parallel. However, since the default nbCore is 1, I am surprised it still tries to use multiple processers (I am on a 8 core machine). Is there anything I can do about this? I really want to use the geneModels..
>>> My second question follows the output below.
>>> 
>>> Thanks for your help and time!
>>> 
>>> J. Wade Davis, PhD
>>> Associate Professor, Department of Health Management and Informatics 
>>> Co-Director, Biostatistics and Research Design Unit
>>> 187 Galena DC 018.0
>>> University of Missouri
>>> Columbia, MO 65212
>>> Phone: (573) 882-0770
>>> Fax: (573) 884-4196
>>> www.biostat.missouri.edu
>>> 
>>> 
>>> 
>>>> # run this code only the first time this way to get the annotations; 
>>>> # run the subsequent code for future use....
>>>> rnaSeq2 <- easyRNASeq(filesDirectory=extdataDir,
>>> +                      organism="Mmusculus",
>>> +                      chr.sizes=chr.sizes,
>>> +                      filter=myfilt,
>>> +                      readLength=50L,
>>> +                      annotationMethod="biomaRt",
>>> +                     # format="SolexaExport",
>>> +                       format="aln",
>>> +                      #withId=TRUE,
>>> +                      count="genes",
>>> +                      summarization= "geneModels",
>>> +                      #count="exons", #generates error message
>>> +                      #filenames=fastqfilenames[1],
>>> +                      pattern="^1-Feb_ATCACG_L003_R1_001_export.txt.gz$",
>>> +                      outputFormat="RNAseq"
>>> +                      )
>>> Checking arguments...
>>> Fetching annotations...
>>> Computing gene models...
>>> Error in makeForkCluster(nnodes = nbCore) :
>>> fork clusters are not supported on Windows
>>>> sessionInfo()
>>> R Under development (unstable) (2012-02-28 r58513)
>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>> 
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         
>>> [5] LC_TIME=English_United States.1252   
>>> 
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base    
>>> 
>>> other attached packages:
>>> [1] RnaSeqTutorial_0.0.7                   BSgenome.Dmelanogaster.UCSC.dm3_1.3.17 BSgenome.Mmusculus.UCSC.mm9_1.3.17     easyRNASeq_1.1.7                     
>>> [5] ShortRead_1.13.13                      latticeExtra_0.6-20                    RColorBrewer_1.0-5                     Rsamtools_1.7.37                     
>>> [9] DESeq_1.7.6                            locfit_1.5-6                           lattice_0.20-0                         akima_0.5-7                           
>>> [13] BSgenome_1.23.2                        GenomicRanges_1.7.28                   Biostrings_2.23.6                      IRanges_1.13.26                      
>>> [17] edgeR_2.5.9                            limma_3.11.15                          biomaRt_2.11.1                         Biobase_2.15.3                       
>>> [21] BiocGenerics_0.1.7                     genomeIntervals_1.11.0                 intervals_0.13.3                     
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] annotate_1.33.2       AnnotationDbi_1.17.22 bitops_1.0-4.1        DBI_0.2-5             genefilter_1.37.0     geneplotter_1.33.1    grid_2.15.0         
>>> [8] hwriter_1.3           RCurl_1.91-1.1        RSQLite_0.11.1        splines_2.15.0        survival_2.36-12      tools_2.15.0          XML_3.9-4.1         
>>> [15] xtable_1.7-0          zlibbioc_1.1.1      
>>>> library(parallel)
>>>> ?makeForkClsuter
>>> No documentation for 'makeForkClsuter' in specified packages and libraries:
>>> you could try '??makeForkClsuter'
>>> 
>>> 
>>> My second question is about a different error message, which seems very close to one you dealt with on the mailing list a few weeks ago, but the person never said if it was resolved:
>>> 
>>> http://thread.gmane.org/gmane.science.biology.informatics.conductor/3
>>> 8841/focus=38858
>>> 
>>> Here is my code and the error:
>>> 
>>> 
>>> rnaSeq2 <- easyRNASeq(filesDirectory=extdataDir,
>>> +                      organism="Mmusculus",
>>> +                      chr.sizes=chr.sizes,
>>> +                      filter=myfilt,
>>> +                      readLength=50L,
>>> +                      annotationMethod="biomaRt",
>>> +                       format="aln",
>>> +                      count="exons",
>>> +                      pattern="^1-Feb_ATCACG_L003_R1_001_export.txt.gz$",
>>> +                      outputFormat="RNAseq"
>>> +                      )
>>> Checking arguments...
>>> Fetching annotations...
>>> Summarizing counts...
>>> Processing 1-Feb_ATCACG_L003_R1_001_export.txt.gz
>>> Error in FUN(c("chr1.fa", "chr10.fa", "chr11.fa", "chr12.fa", "chr13.fa",  :
>>> (list) object cannot be coerced to type 'integer'
>>> In addition: Warning message:
>>> In easyRNASeq(filesDirectory = extdataDir, organism = "Mmusculus",  :
>>> There are 429316 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?
>>> 
>>> What easyRNASeq is using as my chromosome names is different from my Illumina file (see below). I'm sure this is the source of my error, I just don't know how to change the names and fix this. The file I am dealing with from the sequencing core is aligned I am told as such "The alignment was run in a splice-aware fashion using as input the mm9 build of the mouse genome from UCSC as provided by Illumina."
>>> 
>>> 
>>> From my Illumina export/fastQ file read in using ReadAlign:
>>> 
>>> 
>>> table(chromosome(aln))
>>> 
>>>            chr1.fa             chr10.fa             chr11.fa             chr12.fa             chr13.fa             chr14.fa
>>>             285307               221943               395265               115402               108108               127673
>>>           chr15.fa             chr16.fa             chr17.fa             chr18.fa             chr19.fa              chr2.fa
>>>             159310               143801               201871                83037               156612               304649
>>>            chr3.fa              chr4.fa              chr5.fa              chr6.fa              chr7.fa              chr8.fa
>>>             168498               222071               234236               229869               297021               194308
>>>            chr9.fa              chrX.fa              chrY.fa                   NM                   QC                   RM
>>>             264232               124942                    3              1009813                 5877              1074323
>>> splice_sites-auto.fa
>>>             804227
>>> 
>>> 
>>> 
>>> 
>>> as.list(seqlengths(Mmusculus))
>>> 
>>> 
>>> $chr1
>>> [1] 197195432
>>> 
>>> $chr2
>>> [1] 181748087
>>> 
>>> $chr3
>>> [1] 159599783
>>> 
>>> $chr4
>>> [1] 155630120
>>> 
>>> $chr5
>>> [1] 152537259
>>> 
>>> $chr6
>>> [1] 149517037
>>> 
>>> $chr7
>>> [1] 152524553
>>> 
>>> $chr8
>>> [1] 131738871
>>> 
>>> $chr9
>>> [1] 124076172
>>> 
>>> $chr10
>>> [1] 129993255
>>> 
>>> $chr11
>>> [1] 121843856
>>> 
>>> $chr12
>>> [1] 121257530
>>> 
>>> $chr13
>>> [1] 120284312
>>> 
>>> $chr14
>>> [1] 125194864
>>> 
>>> $chr15
>>> [1] 103494974
>>> 
>>> $chr16
>>> [1] 98319150
>>> 
>>> $chr17
>>> [1] 95272651
>>> 
>>> $chr18
>>> [1] 90772031
>>> 
>>> $chr19
>>> [1] 61342430
>>> 
>>> $chrX
>>> [1] 166650296
>>> 
>>> $chrY
>>> [1] 15902555
>>> 
>>> $chrM
>>> [1] 16299
>>> 
>>> $chr1_random
>>> [1] 1231697
>>> 
>>> $chr3_random
>>> [1] 41899
>>> 
>>> $chr4_random
>>> [1] 160594
>>> 
>>> $chr5_random
>>> [1] 357350
>>> 
>>> $chr7_random
>>> [1] 362490
>>> 
>>> $chr8_random
>>> [1] 849593
>>> 
>>> $chr9_random
>>> [1] 449403
>>> 
>>> $chr13_random
>>> [1] 400311
>>> 
>>> $chr16_random
>>> [1] 3994
>>> 
>>> $chr17_random
>>> [1] 628739
>>> 
>>> $chrX_random
>>> [1] 1785075
>>> 
>>> $chrY_random
>>> [1] 58682461
>>> 
>>> $chrUn_random
>>> [1] 5900358
>>> 
>>> 
>>> 
>>> 
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list