[BioC] how to put together 600 files from the 5.0 Affy SNPchip

Laura Rodriguez Murillo laura.lmurillo at gmail.com
Fri Nov 7 21:32:40 CET 2008


Hi,

Thank you for your email. As soon as I get some free time I'll try to
learn some perl. See what I can do with these files.

Laura

2008/11/7 laurent <lgautier at gmail.com>:
> Such files do exceed the capabilities of most machines.
>
> One way I helped someone with a similar problem was to hack a script
> (Python or Perl come to my mind for such jobs, but it is possible to
> implement it in R if you like) that read the files as byte streams
> rather than line per line.
>
> This way it run with a minimal memory footprint (and will on Microsoft
> Windows... if the resulting file size does not exceed the capabilities
> of the OS).
>
>
> L.
>
> On Fri, 2008-11-07 at 13:38 -0500, Laura Rodriguez Murillo wrote:
>> Benilton,
>> Thank you. Unfortunately, the paste command doesn't work with these
>> big files. Does this only work with R under unix or is it possible in
>> windows?. Otherwise, I'll try with R when I get back to the unix
>> machine.
>>
>> Laura
>>
>> 2008/11/7 Benilton Carvalho <bcarvalh at jhsph.edu>:
>> > Laura,
>> >
>> > if you're running *NIX, can't you just use the bash command "paste"?
>> >
>> > if you really want to use R, assume you have names of your files in the
>> > variable "files", then something like:
>> >
>> > ## This goes in R and assumes you're running *NIX
>> > cmd <- paste("paste", paste(files, collapse=" "), "> output.txt")
>> > system(cmd)
>> >
>> > later, you can just get rid of the first 4 lines of output.txt.
>> >
>> > b
>> >
>> > On Nov 7, 2008, at 3:04 PM, Laura Rodriguez Murillo wrote:
>> >
>> >> David,
>> >>
>> >> Thank you for your reply. I had tried with this software but it
>> >> doesn't recognize my files, it looks as it doesn't like .txt files.
>> >> Any idea?
>> >>
>> >> Laura
>> >>
>> >> 2008/11/6 David Carter <dcarter at robarts.ca>:
>> >>>
>> >>> Hi Laura,
>> >>> Affymetrix has a free tool (free and easy to download) called Genotyping
>> >>> Console that will export 1 file for all your samples with SNPs on rows
>> >>> and
>> >>> samples on columns.  I haven't tried it with 622 samples though...
>> >>> Sincerely,
>> >>> David Carter
>> >>>
>> >>>
>> >>> Laura Rodriguez Murillo wrote:
>> >>>>
>> >>>> Hi,
>> >>>> I'm new in this mailing list and  also using bioconductor. I'd
>> >>>> appreciate your feedback on this: I have 622 files that correspond to
>> >>>> 622 samples genotyped for the SNPs in the 5.0 SNPChip from Affymetrix.
>> >>>> Each file consists of two columns of almost 500 K rows (plus 4 lines
>> >>>> at the begining that I won't need). The number of rows are the same in
>> >>>> every file. I would need to put all these files together, where the
>> >>>> first column is common to all of them (SNP names) (so I just need it
>> >>>> once in the big file). Once I have all the columns one after the other
>> >>>> I would also need to paste a column with the chromosome number for
>> >>>> each SNP (which is in another file, just this info alone). Do you know
>> >>>> if there's any way to do this with Bioconductor?.
>> >>>>
>> >>>> Thank you!
>> >>>>
>> >>>> Laura
>> >>>>
>> >>>> _______________________________________________
>> >>>> Bioconductor mailing list
>> >>>> Bioconductor at stat.math.ethz.ch
>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>> Search the archives:
>> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> David Carter
>> >>> Facility Manager
>> >>> London Regional Genomics Centre
>> >>> Robarts Research Institute, Room 4.01
>> >>> PO Box 5015, 100 Perth Drive
>> >>> London, Ontario, Canada, N6A 5K8
>> >>>
>> >>> phone:  519-663-3253
>> >>> fax:    519-663-3037
>> >>>
>> >>> dcarter at robarts.ca
>> >>> http://www.lrgc.ca
>> >>>
>> >>>
>> >>>
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list