[BioC] FastqStreamer

Martin Morgan mtmorgan at fhcrc.org
Tue May 29 02:06:04 CEST 2012


On 05/25/2012 02:41 PM, Marcus Davy wrote:
> Hi Martin,
> thanks for looking into this, I think it would enhance FastqStreamers
> flexibility to be able to fetch any specified ranges of a Fastq file.
>
> The IRanges approach is similar to my thoughts, with width by default
> (either constant 'n' or variable length using vector recycling), or
> start, and end indexes selected.

I updated ShortRead 1.15.7 in devel to allow FastqStreamer to accept an 
IRanges object and yield() corresponding records in the fastq file; see 
?FastqStreamer.

Martin

>
> cheers,
>
> Marcus
>
>
> On Sat, May 26, 2012 at 1:11 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 05/24/2012 05:19 PM, Marcus Davy wrote:
>
>         Hi,
>
>         I have had a look at FastqStreamer to stream in successive
>         subsets of a
>         Fastq file.
>
>
>         My question is whether you can change the number of records to
>         stream on
>         the fly rather than having to stream 'n' records each time.
>
>
>         For example, I might want to pull in records corresponding to
>         each Illumina
>         tile from the indices fetched within the Fastq header information,
>
>
>     Hi Marcus -- this isn't possible at the moment, but I'm giving this
>     (and the ability to pull out specific id's) some thought. Along the
>     lines of an IRanges() argument with start and end being the parts of
>     the fastq file to retrieve, and with 'yield' returning the next
>     range's worth of data.
>
>     Martin
>
>
>         or just fetch a certain tile with a record index range m:n which
>         does not
>         nessarily start at m=1 within the Fastq file.
>
>
>         sp<- SolexaPath(system.file('__extdata', package='ShortRead'))
>
>         fl<- file.path(analysisPath(sp), "s_1_sequence.txt")
>
>         length(readFastq(f))
>
>         [1] 256
>
>
>         ## This fails as n is expected to be a constant amount of
>         streamed records
>
>         f<- FastqStreamer(fl, c(100, 50, 100, 6))
>
>         Error in FastqStreamer(fl, c(100, 50, 100, 6)) :
>
>         'n' must be finite and>= 0
>
>
>
>         To fetch a certain tile can you alter the 'added' field position
>         similar to
>         'seek' in perl so you can grab only that index range  of the
>         Fastq file
>         without having to go through a while loop?
>
>
>         f<- FastqStreamer(fl, 50)
>
>         print(f)
>
>         class: FastqStreamer
>
>         file: s_1_sequence.txt
>
>         status: n=50 current=0 added=0 total=0  ##<- I want to change the
>         'current/added fields'
>
>
>
>         cheers,
>
>
>         Marcus
>
>                 [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>     --
>     Computational Biology
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
>     Location: M1-B861
>     Telephone: 206 667-2793 <tel:206%20667-2793>
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list