[R] data reshape

Fri Dec 20 18:37:39 CET 2019

?merge ## note the all.x option
Example:
> a <- data.frame(x = 1:3, y1 = 11:13)
> b <- data.frame(x = c(1,3), y2 = 21:22)

> merge(a,b, all.x = TRUE)
  x y1 y2
1 1 11 21
2 2 12 NA
3 3 13 22

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Dec 20, 2019 at 9:00 AM Yuan Chun Ding <ycding using coh.org> wrote:

> Hi Bert,
>
>
>
> Sorry that I was in a hurry  going home yesterday afternoon and just
> posted my question and hoped to get some advice.
>
>
>
> Here is what I got yesterday before going home.
>
> ---------------------------------------------------------------
>
> setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")
>
>
>
> file_list <- list.files(pattern="*.out")
>
>
>
> #to read all 652 files into Rstudio and found that NOT all files have same
> number of rows
>
> for (i in 1:length(file_list)){
>
>
>
>   assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,
>
>
>
>          read.delim(file_list[i], head=F))
>
> }
>
>
>
> #the first file, GTEX_1117F, in the following format,  one column and
> 19482 rows
>
> #4 is marker id, 25/48 is its marker value;
>
> #  V1
>
> #  4
>
> # 25/48
>
> # 201
>
> # 2/2
>
> # ...
>
> # 648589
>
> # None
>
>
>
> #to make this one-column file into a two-column file as below
>
> # so first column is marker id, second is corresponding marker values for
> the sample GTEX_1117F
>
> #  VNTRid      GTEX_1117F
>
> #   4               25/48
>
> #   201            2/2
>
> #    ...          ...
>
> # 648589          None
>
>
>
> for (i in 1:length(file_list)){
>
>   temp <- read.delim(file_list[i], head=F)
>
>   even <-seq(2, length(temp$V1),2)
>
>   odd <-seq(1, length(temp$V1)-1, 2)
>
>   output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
>
>   colnames(output)<- c("VNTRid",substr(file_list[i], 1,
> nchar(file_list[i]) -4))
>
>   for (j in 1:length(temp$V1)/2){
>
>   output[j,1]<- as.character(temp$V1)[odd[j]]
>
>   output[j,2]<- as.character(temp$V1)[even[j]]}
>
>   assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)),
> as.data.frame(output))
>
>                              }
>
>
>
> Yesterday, I intended to reshape the output file above from long to wide
> using VNTRid as key.
>
> Since not all files have the same number of rows, after reshaping, those
> file would not bind correctly using rbind function.
>
> One my way to work place this morning, I changed my intension; I will not
> reshape to wide format and actually like the long format I generated. I
> will read in a VNTR marker annotation file including VNTRid in first column
> and marker locations in human chromosomes in the second column, this
> annotation file should include all the VNTR markers.  I know the VNTRid in
> the annotation file are same as the VNTRid in the 652 file I read in.
>
>
>
> Do you know a good way to merge all those 652 files (with two columns) ?
>
>
>
> Thank you,
>
>
>
> Ding
>
>
>
>
>
> #merge all 652 files into one file with VNTRid as first column, 2nd to
> 653th column are genotype with header
>
> #as sample ID,  so
>
>
>
> *From:* Bert Gunter [mailto:bgunter.4567 using gmail.com]
> *Sent:* Thursday, December 19, 2019 6:52 PM
> *To:* Yuan Chun Ding
> *Cc:* r-help using r-project.org
> *Subject:* Re: [R] data reshape
>
>
> ------------------------------
>
> [Attention: This email came from an external source. Do not open
> attachments or click on links from unknown senders or unexpected emails.]
> ------------------------------
>
> Did you even make an attempt to do this? -- or would you like us do all
> your work for you?
>
>
>
> If you made an attempt, show us your code and errors.
>
> If not, we usually expect you to try on your own first.
>
> If you have no idea where to start, perhaps you need to spend some more
> time with tutorials to learn basic R functionality before proceeding.
>
>
>
> Bert
>
>
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>
>
>
> On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <ycding using coh.org> wrote:
>
> Hi R users,
>
> I have a folder (called genotype) with 652 files; the file names are
> GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only
> one column of data without a header as below
> 201
> 2/2
> 238
> 3/4
> 245
> 1/2
> .....
> 983255
> 3/3
> 983766
> None
>
>
> A total of 20528 rows;
>
> I need to read all those 652 files in the genotype folder and then reshape
> the one column in each file as:
> SampleID             201        238        245        ....   983255
>  983766
> GTEX-1A3MV     2/2         3/4        1/2                         3/3
>    None
>
> There are 10264 data columns plus the sample ID column, so 10265 columns
> in total after data reshaping.
>
> After reading those 652 file and reshape the one column in each file, I
> will stack them by the rbind function, then I have a file with a dimension
> of 653 row, 10265 column.
>
>
> Thank you,
>
> Ding
>
> ----------------------------------------------------------------------
> ------------------------------------------------------------
> -SECURITY/CONFIDENTIALITY WARNING-
>
> This message and any attachments are intended solely for the individual or
> entity to which they are addressed. This communication may contain
> information that is privileged, confidential, or exempt from disclosure
> under applicable law (e.g., personal health information, research data,
> financial information). Because this e-mail has been sent without
> encryption, individuals other than the intended recipient may be able to
> view the information, forward it to others or tamper with the information
> without the knowledge or consent of the sender. If you are not the intended
> recipient, or the employee or person responsible for delivering the message
> to the intended recipient, any dissemination, distribution or copying of
> the communication is strictly prohibited. If you received the communication
> in error, please notify the sender immediately by replying to this message
> and deleting the message and any accompanying files from your system. If,
> due to the security risks, you do not wish to rec
>  eive further communications via e-mail, please reply to this message and
> inform the sender that you do not wish to receive further e-mail from the
> sender. (LCP301)
> ------------------------------------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]