[BioC] how to open a SNP data file as large as 500M in Windows OR just extract part of data

Martin Morgan mtmorgan at fhcrc.org
Fri Sep 10 17:46:25 CEST 2010


On 09/10/2010 06:36 AM, James W. MacDonald wrote:
> Another solution is to install the Rtools toolset and use grep or sed.
>
> http://www.murdoch-sutherland.com/Rtools/
>
> something like
>
> grep <your snp name here> <snp file name here>
>
> will get the SNP data without having to open the entire file at one
> time. An alternative is
>
> sed -n '/<snp name here/p'
>
> which will do the same. And usually faster than opening the entire
> file just to find one line.
>
> You can of course re-direct the output into a new file by adding a
>
> > mynewfile.txt
>
> at the end of either of the above.
> Best,
>
> Jim
>
>
>
> On 9/10/2010 12:49 AM, Michael Imbeault wrote:
>>
>> You could try http://www.editpadpro.com/ - it opens arbitrary large
>> files, I opened 1 GB text files with it before.
>>
>> Michael
>>
>> On 09/09/2010 11:26 PM, xiangxue Guo wrote:
>>> Hi,there
>>>
>>> Does anybody know how to open a SNP data file as large as 500M in
>>> Windows computer? These data are SNPs for many chromosomes, and we
>>> just need one of them. Thus if someone knowes how to extract the data
>>> of just one chromosome, it also should be OK for us.
>>>

Or in R open a connection to the file (possibly compressed or remote)
and process chunks until satisfied

    con <- file("c:\\some\\file", "r")
    repeat {
        value <- grep(snpId, readLines(con, 1000000), value=TRUE)
        if (0 != length(value))
            break;
    }
    value

Or maybe your file is structured like a table, perhaps read.table with
colClasses to read in just the necessary columns would allow you to read
the relevant parts of the entire file. Or use the approach above, where
'con' is used with read.table to process chunks at a time and accumulate
matching records.

Martin

>
>>> Thanks in advanced,
>>>
>>> Guo
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list