[BioC] how to put together 600 files from the 5.0 Affy SNPchip

laurent lgautier at gmail.com
Fri Nov 7 19:51:28 CET 2008


Such files do exceed the capabilities of most machines.

One way I helped someone with a similar problem was to hack a script
(Python or Perl come to my mind for such jobs, but it is possible to
implement it in R if you like) that read the files as byte streams
rather than line per line.

This way it run with a minimal memory footprint (and will on Microsoft
Windows... if the resulting file size does not exceed the capabilities
of the OS).


L.

On Fri, 2008-11-07 at 13:38 -0500, Laura Rodriguez Murillo wrote:
> Benilton,
> Thank you. Unfortunately, the paste command doesn't work with these
> big files. Does this only work with R under unix or is it possible in
> windows?. Otherwise, I'll try with R when I get back to the unix
> machine.
> 
> Laura
> 
> 2008/11/7 Benilton Carvalho <bcarvalh at jhsph.edu>:
> > Laura,
> >
> > if you're running *NIX, can't you just use the bash command "paste"?
> >
> > if you really want to use R, assume you have names of your files in the
> > variable "files", then something like:
> >
> > ## This goes in R and assumes you're running *NIX
> > cmd <- paste("paste", paste(files, collapse=" "), "> output.txt")
> > system(cmd)
> >
> > later, you can just get rid of the first 4 lines of output.txt.
> >
> > b
> >
> > On Nov 7, 2008, at 3:04 PM, Laura Rodriguez Murillo wrote:
> >
> >> David,
> >>
> >> Thank you for your reply. I had tried with this software but it
> >> doesn't recognize my files, it looks as it doesn't like .txt files.
> >> Any idea?
> >>
> >> Laura
> >>
> >> 2008/11/6 David Carter <dcarter at robarts.ca>:
> >>>
> >>> Hi Laura,
> >>> Affymetrix has a free tool (free and easy to download) called Genotyping
> >>> Console that will export 1 file for all your samples with SNPs on rows
> >>> and
> >>> samples on columns.  I haven't tried it with 622 samples though...
> >>> Sincerely,
> >>> David Carter
> >>>
> >>>
> >>> Laura Rodriguez Murillo wrote:
> >>>>
> >>>> Hi,
> >>>> I'm new in this mailing list and  also using bioconductor. I'd
> >>>> appreciate your feedback on this: I have 622 files that correspond to
> >>>> 622 samples genotyped for the SNPs in the 5.0 SNPChip from Affymetrix.
> >>>> Each file consists of two columns of almost 500 K rows (plus 4 lines
> >>>> at the begining that I won't need). The number of rows are the same in
> >>>> every file. I would need to put all these files together, where the
> >>>> first column is common to all of them (SNP names) (so I just need it
> >>>> once in the big file). Once I have all the columns one after the other
> >>>> I would also need to paste a column with the chromosome number for
> >>>> each SNP (which is in another file, just this info alone). Do you know
> >>>> if there's any way to do this with Bioconductor?.
> >>>>
> >>>> Thank you!
> >>>>
> >>>> Laura
> >>>>
> >>>> _______________________________________________
> >>>> Bioconductor mailing list
> >>>> Bioconductor at stat.math.ethz.ch
> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>> Search the archives:
> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>>>
> >>>>
> >>>
> >>> --
> >>> David Carter
> >>> Facility Manager
> >>> London Regional Genomics Centre
> >>> Robarts Research Institute, Room 4.01
> >>> PO Box 5015, 100 Perth Drive
> >>> London, Ontario, Canada, N6A 5K8
> >>>
> >>> phone:  519-663-3253
> >>> fax:    519-663-3037
> >>>
> >>> dcarter at robarts.ca
> >>> http://www.lrgc.ca
> >>>
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list