[R] reordering huge data file

jim holtman jholtman at gmail.com
Mon Jan 21 23:46:19 CET 2008


It is still not clear exactly how you want to be able to do the
transformation.  What does "Phenotypeinfo.txt " contain and how big is
it?  Is sound like this has some data that you want to be able to
merge into the other file.  If this is just reading in
Phenotypeinfo.txt, and then using that data to pass through the other
larger file and do some matching substitution, I would probably do it
in Perl which has much better text handling capabilities.  It would be
useful if you could "provide commented, minimal, self-contained,
reproducible code" to see how the data is to be
manipulated/transformed.

On Jan 21, 2008 4:45 PM, Boks, M.P.M. <M.P.M.Boks at umcutrecht.nl> wrote:
>
> Dear R-experts,
>
> My problem is how to handle a 10GB data file containing genotype data. The file is in a particular format (Illumina final report) and needs to be altered and merged with phenotype data for further analysis.
>
> PERL seems to be an frequently used solution for this type of work, however I am inclined to think it should be doable with R.
>
> How do I open a text-file, line by line, evaluate it and write it back into a textfile in a different position;
>
> Phenotypeinfo.txt (contains phenotype information)
>
> Before.txt (contains genotypeinformation -see below-)
>
> SNP;1-305,000   ID:1-900        allele.A  alleleB
>
>
> After.txt (the required format)
>
> ID:1-250 phenotype SNP1.allelA  SNP1.alleleB    SNP2.Allele.A SNP2.allele.B etc
>
>
> I have been looking at ?read.table/scan/readline/SQL-light but have not resolved it. Should I refer to PERL or can this be tackled?
>
> I am using a windows machine with R 2.6.0
>
> Any help would be highly appreciated,
>
> Many Thanks,
>
> Marco
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list