[R] How to transpose it in a fast way?

Khan, Sohail SKhan30 at nshs.edu
Fri Mar 8 12:21:46 CET 2013


Perhaps you could process this with a unix/Linux utility "Awk", before reading the file into R.
-Sohail

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of peter dalgaard [pdalgd at gmail.com]
Sent: Friday, March 08, 2013 5:08 AM
To: Yao He
Cc: R help
Subject: Re: [R] How to transpose it in a fast way?

On Mar 7, 2013, at 01:18 , Yao He wrote:

> Dear all:
>
> I have a big data file of 60000 columns and 60000 rows like that:
>
> AA AC AA AA .......AT
> CC CC CT CT.......TC
> ..........................
> .........................
>
> I want to transpose it and the output is a new like that
> AA CC ............
> AC CC............
> AA CT.............
> AA CT.........
> ....................
> ....................
> AT TC.............
>
> The keypoint is  I can't read it into R by read.table() because the
> data is too large,so I try that:
> c<-file("silygenotype.txt","r")
> geno_t<-list()
> repeat{
>  line<-readLines(c,n=1)
>  if (length(line)==0)break  #end of file
>  line<-unlist(strsplit(line,"\t"))
> geno_t<-cbind(geno_t,line)
> }
> write.table(geno_t,"xxx.txt")
>
> It works but it is too slow ,how to optimize it???


As others have pointed out, that's a lot of data!

You seem to have the right idea: If you read the columns line by line there is nothing to transpose. A couple of points, though:

- The cbind() is a potential performance hit since it copies the list every time around. geno_t <- vector("list", 60000) and then
geno_t[[i]] <- <etc>

- You might use scan() instead of readLines, strsplit

- Perhaps consider the data type as you seem to be reading strings with 16 possible values (I suspect that R already optimizes string storage to make this point moot, though.)

--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


The information contained in this electronic e-mail transmission and any attachments are intended only for the use of the individual or entity to whom or to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this communication is not the intended recipient, or the employee or agent responsible for delivering this communication to the intended recipient, you are hereby notified that any dissemination, distribution, copying or disclosure of this communication and any attachment is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by telephone and electronic mail, and delete the original communication and any attachment from any computer, server or other electronic recording or storage device or medium. Receipt by anyone other than the intended recipient is not a waiver of any attorney-client, physician-patient or other privilege.


More information about the R-help mailing list