[R] data reshape

Fri Dec 20 18:00:00 CET 2019

Hi Bert,

Sorry that I was in a hurry  going home yesterday afternoon and just posted my question and hoped to get some advice.

Here is what I got yesterday before going home.
---------------------------------------------------------------
setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")

file_list <- list.files(pattern="*.out")

#to read all 652 files into Rstudio and found that NOT all files have same number of rows
for (i in 1:length(file_list)){

  assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,

         read.delim(file_list[i], head=F))
}

#the first file, GTEX_1117F, in the following format,  one column and 19482 rows
#4 is marker id, 25/48 is its marker value;
#  V1
#  4
# 25/48
# 201
# 2/2
# ...
# 648589
# None

#to make this one-column file into a two-column file as below
# so first column is marker id, second is corresponding marker values for the sample GTEX_1117F
#  VNTRid      GTEX_1117F
#   4               25/48
#   201            2/2
#    ...          ...
# 648589          None

for (i in 1:length(file_list)){
  temp <- read.delim(file_list[i], head=F)
  even <-seq(2, length(temp$V1),2)
  odd <-seq(1, length(temp$V1)-1, 2)
  output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
  colnames(output)<- c("VNTRid",substr(file_list[i], 1, nchar(file_list[i]) -4))
  for (j in 1:length(temp$V1)/2){
  output[j,1]<- as.character(temp$V1)[odd[j]]
  output[j,2]<- as.character(temp$V1)[even[j]]}
  assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)), as.data.frame(output))
                             }

Yesterday, I intended to reshape the output file above from long to wide using VNTRid as key.
Since not all files have the same number of rows, after reshaping, those file would not bind correctly using rbind function.
One my way to work place this morning, I changed my intension; I will not reshape to wide format and actually like the long format I generated. I will read in a VNTR marker annotation file including VNTRid in first column and marker locations in human chromosomes in the second column, this annotation file should include all the VNTR markers.  I know the VNTRid in the annotation file are same as the VNTRid in the 652 file I read in.

Do you know a good way to merge all those 652 files (with two columns) ?

Thank you,

Ding

#merge all 652 files into one file with VNTRid as first column, 2nd to 653th column are genotype with header
#as sample ID,  so

From: Bert Gunter [mailto:bgunter.4567 using gmail.com]
Sent: Thursday, December 19, 2019 6:52 PM
To: Yuan Chun Ding
Cc: r-help using r-project.org
Subject: Re: [R] data reshape

________________________________
[Attention: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.]
________________________________
Did you even make an attempt to do this? -- or would you like us do all your work for you?

If you made an attempt, show us your code and errors.
If not, we usually expect you to try on your own first.
If you have no idea where to start, perhaps you need to spend some more time with tutorials to learn basic R functionality before proceeding.

Bert

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <ycding using coh.org<mailto:ycding using coh.org>> wrote:
Hi R users,

I have a folder (called genotype) with 652 files; the file names are  GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only one column of data without a header as below
201
2/2
238
3/4
245
1/2
.....
983255
3/3
983766
None

A total of 20528 rows;

I need to read all those 652 files in the genotype folder and then reshape the one column in each file as:
SampleID             201        238        245        ....   983255         983766
GTEX-1A3MV     2/2         3/4        1/2                         3/3         None

There are 10264 data columns plus the sample ID column, so 10265 columns in total after data reshaping.

After reading those 652 file and reshape the one column in each file, I will stack them by the rbind function, then I have a file with a dimension of 653 row, 10265 column.

Thank you,

Ding

----------------------------------------------------------------------
------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-

This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to rec
 eive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301)
------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]