[R] is there is a way to extract lines in between 3 files that are in common based on one column?

Ana Marija @okov|c@@n@m@r|j@ @end|ng |rom gm@||@com
Tue Jun 2 05:37:39 CEST 2020


Hi Jim,

not in this case, but thanks for asking!

Ana

On Mon, Jun 1, 2020 at 10:04 PM Jim Lemon <drjimlemon using gmail.com> wrote:
>
> So recombination sticks out its foot before us. Do you want to account
> for gene linkage?
>
> JIm
>
> On Tue, Jun 2, 2020 at 11:55 AM Ana Marija <sokovic.anamarija using gmail.com> wrote:
> >
> > Hi Jim
> >
> > > neu3<-neu1[!(neu1$Marker %in% Marker3),]
> > > dim(neu3)
> > [1] 1857    9
> > > nep3<-nep1[!(nep1$Marker %in% Marker3),]
> > > dim(nep3)
> > [1] 5562    9
> > > ret3<-ret1[!(ret1$Marker %in% Marker3),]
> > > dim(ret3)
> > [1] 3493    9
> >
> >
> > If I do:
> >
> >  nn1<-merge(neu1,nep1,by=c("Marker","Chr"))
> > nn2<-merge(nn1,ret1,by=c("Marker","Chr"))
> > > Marker3<-nn2$Marker
> > > length(Marker3)
> > [1] 3742962
> > > Marker4<-nn1$Marker
> > > length(Marker4)
> > [1] 3744443
> >
> > On Mon, Jun 1, 2020 at 8:50 PM Ana Marija <sokovic.anamarija using gmail.com> wrote:
> > >
> > > Hi David,
> > >
> > > that is a great point!
> > > Yes indeed some are non unique:
> > >
> > > > dim(neu1)
> > > [1] 3742845       9
> > > > length(unique(neu1$Marker))
> > > [1] 3741858
> > > > length(unique(nep1$Marker))
> > > [1] 3745560
> > > > dim(nep1)
> > > [1] 3746550       9
> > > > length(unique(ret1$Marker))
> > > [1] 3743494
> > > > dim(ret1)
> > > [1] 3743494       9
> > >
> > > How would I rewrite this code so that is merging by Chr and Marker
> > > column? It seems that a Marker can be under a few Chr.
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jun 1, 2020 at 8:41 PM David Winsemius <dwinsemius using comcast.net> wrote:
> > > >
> > > >
> > > > On 6/1/20 5:40 PM, Ana Marija wrote:
> > > > > Hi Jim,
> > > > >
> > > > > thank you so much for getting back to me. I tried your code and this is
> > > > > what I get:
> > > > >> dim(neu2)
> > > > > [1] 3740988       9
> > > > >> dim(nep2)
> > > > > [1] 3740988       9
> > > > >> dim(ret2)
> > > > > [1] 3740001       9
> > > > >
> > > > > I think I would need to have the same number of lines in all 3 data frames.
> > > > >
> > > > > Can you please advise.
> > > >
> > > >
> > > > You should check for duplicated Marker values.
> > > >
> > > >
> > > > --
> > > >
> > > > David
> > > >
> > > > >
> > > > > Cheers
> > > > > Ana
> > > > >
> > > > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon <drjimlemon using gmail.com> wrote:
> > > > >
> > > > >> Hi Ana,
> > > > >> Not too hard, but your example has all the "marker" fields in common.
> > > > >> So using a sample that will show the expected result:
> > > > >>
> > > > >> neu1<-read.table(text="Chr BP Marker  MAF A1 A2 Direction  pValue N
> > > > >>   1 100000012 1:100000012:G:T 0.229925  T  G  + 0.650403 1594
> > > > >>   1 100000827 1:100000827:C:T 0.287014  T  C  + 0.955449 1594
> > > > >>   1 100002713 1:100002713:C:T 0.097867  T  C  - 0.290455 1594
> > > > >>   1 100002882 1:100002882:T:G 0.287014  G  T  + 0.955449 1594
> > > > >>   1 100002991 1:100002991:G:A 0.097867  A  G  - 0.290455 1594
> > > > >>   1 100004726 1:100004726:G:A 0.132058  A  G  + 0.115005 1594",
> > > > >>   header=TRUE,stringsAsFactors=FALSE)
> > > > >>
> > > > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 Direction    pValue N
> > > > >>   1 100000012 1:100000012:G:T 0.2300430 T  G - 0.1420030 1641
> > > > >>   1 100000827 1:100000827:C:T 0.2867150 T  C - 0.2045580 1641
> > > > >>   1 100002713 1:100002713:C:T 0.0975015 T  C - 0.0555507 1641
> > > > >>   1 100002882 1:100002882:T:G 0.2867150 G  T - 0.2045580 1641
> > > > >>   1 100002991 1:100002991:G:A 0.0975015 A  G - 0.0555507 1641
> > > > >>   1 100004726 1:100004727:G:A 0.1325410 A  G - 0.8725660 1641",
> > > > >>   header=TRUE,stringsAsFactors=FALSE)
> > > > >>
> > > > >> ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction   pValue N
> > > > >>   1 100000012 1:100000012:G:T 0.2322760 T  G - 0.230383 1608
> > > > >>   1 100000827 1:100000827:C:T 0.2882460 T  C - 0.120356 1608
> > > > >>   1 100002713 1:100002713:C:T 0.0982587 T  C - 0.272936 1608
> > > > >>   1 100002882 1:100002882:T:G 0.2882460 G  T - 0.120356 1608
> > > > >>   1 100002991 1:100002992:G:A 0.0982587 A  G - 0.272936 1608
> > > > >>   1 100004726 1:100004727:G:A 0.1340170 A  G - 0.594538 1608",
> > > > >> header=TRUE,stringsAsFactors=FALSE)
> > > > >>
> > > > >> # merge the three data frames on "Marker"
> > > > >> nn1<-merge(neu1,nep1,by="Marker")
> > > > >> nn2<-merge(nn1,ret1,by="Marker")
> > > > >> # get the common "Marker" strings
> > > > >> Marker3<-nn2$Marker
> > > > >> # subset all three data frames on Marker3
> > > > >> neu2<-neu1[neu1$Marker %in% Marker3,]
> > > > >> nep2<-nep1[nep1$Marker %in% Marker3,]
> > > > >> ret2<-ret1[ret1$Marker %in% Marker3,]
> > > > >>
> > > > >> Jim
> > > > >>
> > > > >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija <sokovic.anamarija using gmail.com>
> > > > >> wrote:
> > > > >>> Hello,
> > > > >>>
> > > > >>> I have 3 data frames which have about 3.4 mill lines (but they don't have
> > > > >>> exactly the same number of lines)...they look like this:
> > > > >>> ...
> > > > >>> Is there is a way to create another 3 data frames, say neu2, nep2, ret2
> > > > >>> which would only contain lines that have the same entries in Marker
> > > > >> column
> > > > >>> for all 3 data frames?
> > > > >>>
> > > > >>> Thanks
> > > > >>> Ana
> > > > >>>
> > > > >>>          [[alternative HTML version deleted]]
> > > > >>>
> > > > >>> ______________________________________________
> > > > >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > > > >>> PLEASE do read the posting guide
> > > > >> http://www.R-project.org/posting-guide.html
> > > > >>> and provide commented, minimal, self-contained, reproducible code.
> > > > >       [[alternative HTML version deleted]]
> > > > >
> > > > > ______________________________________________
> > > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list