[R] R help for creating expression data of Differentially expressed genes

Wed May 8 00:07:26 CEST 2013

HI Arun,

My data sets are as in the provided files. I am providing the sample files.
I guess this will give a better idea to the type of working I want to do
with the two files and the kind or script am trying to write. Hope you can
give me some suggestions regarding this. I am new to R so having trouble to
use different functions to use this for my working.

Anyone who can help me out with this can be of great help.

----------------------------------------------------------

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek.das at ieo.eu
            vchris_05 at yahoo.co.in
            vd4mmind at gmail.com

On Tue, May 7, 2013 at 10:36 PM, arun <smartpink111 at yahoo.com> wrote:

> Hi Vivek,
>
> May be this helps:
> set.seed(35)
>  dat1<- cbind(ID=1:8,
> as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7)))
>
> set.seed(38)
> dat2<- cbind(ID= sample(1:20,8,replace=FALSE),
> as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33)))
> colnames(dat2)[-1]<-gsub("V","X",colnames(dat2)[-1])
>  merge(dat1[,1:2],dat2[,1:31],by="ID")
> #  ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18
> X19 X20
> #1  1 43 44  4 33 47 29 43 31 15  2  34  42   5  18  22  36  34  44   3
> 45   9
> #2  3 28  4 18 45 24  5 20 30 16 49  34  33   5  24  49  31  10  45  21
> 26  20
> #3  6  5 16  1  5  2 26  6 40 16 15  50  26  37  22  25  39  16  24  29
> 50  42
> #4  7 25 26 39 16 29  5 40 15 27 46  16  38  36  42   8   3  29   7  13
> 18  38
> #5  8 30  3 41 25 38 24 41 44 23  2  45  33  10  18  20  49  19  23  42
> 25   5
> #  X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
> #1  14  27   3  21   6  44  33  42  10  29
> #2  48  13   8  47  18   9  23   9  44   3
> #3  25  14  31  19  14   6  26  13   6  49
> #4  43  28  15   6   9  19  43  21  41  21
> #5   1  27  18   3  42   5  16  39  46  47
> A.K.
>
>
>
> ----- Original Message -----
> From: Vivek Das <vd4mmind at gmail.com>
> To: arun <smartpink111 at yahoo.com>
> Cc:
> Sent: Tuesday, May 7, 2013 3:45 PM
> Subject: R help for creating expression data of Differentially expressed
> genes
>
> Hi Arun,
>
> I need some help regarding R scripting. I have two data file one
> containing seven columns and the other containing 33. Both files have
> unique identifier as ID. I want to create another file which should have
> the first two columns of the first file and and the 31 columns of the
> second file matched on the basis of ID. The first file is having gene I'd
> and gene names of around 500 and I want the output file which is having all
> of those and other attributes as well. I want to get the output file having
> all attributes matching with the I'd of the first file. So that I get
> output of 500 rows with all the attributes of second file. I am new to R
> but having trouble with merge function in R. If you can help it will be
> great.
>
> Regards,
> Vivek
>
> Sent from my iPad
>
> On 07/mag/2013, at 21:13, arun <smartpink111 at yahoo.com> wrote:
>
> > HI Ye,
> >
> > For the NA in ID column,
> >
> >
> >
> > Hi
> > dat1<- read.table(text="
> > ObsNumber     ID          Weight
> >      1                 0001         12
> >      2                 0001          13
> >      3                 0001           14
> >      4                  0002         16
> >       5                 0002         17
> >      6                   N/A          18
> >
> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
> >  unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> > #[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2"
> > A.K.
> > ________________________________
> > From: Ye Lin <yelin at lbl.gov>
> > To: arun <smartpink111 at yahoo.com>
> > Cc: R help <r-help at r-project.org>
> > Sent: Tuesday, May 7, 2013 2:54 PM
> > Subject: Re: [R] create unique ID for each group
> >
> >
> >
> > Thanks A.K. But I have "NA" in ID column, so when I apply the code, it
> gives me error saying the replacement as less rows than the data has.
> Anyway for ID=N/A, return sth like "N/A_1" in order as well?
> >
> >
> >
> >
> >
> >
> > On Tue, May 7, 2013 at 11:17 AM, arun <smartpink111 at yahoo.com> wrote:
> >
> > H,
> >> Sorry, a mistake:
> >> dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >> dat1
> >>  # ObsNumber   ID Weight UniqueID
> >> #1         1 0001     12   0001_1
> >> #2         2 0001     13   0001_2
> >> #3         3 0001     14   0001_3
> >> #4         4 0002     16   0002_1
> >> #5         5 0002     17   0002_2
> >>
> >> dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >>
> >> A.K.
> >>
> >>
> >>
> >>
> >>
> >> ----- Original Message -----
> >>
> >> From: arun <smartpink111 at yahoo.com>
> >> To: Ye Lin <yelin at lbl.gov>
> >> Cc: R help <r-help at r-project.org>
> >> Sent: Tuesday, May 7, 2013 2:10 PM
> >> Subject: Re: [R] create unique ID for each group
> >>
> >>
> >>
> >> Hi,
> >>
> >> Try this:
> >> dat1<- read.table(text="
> >> ObsNumber     ID          Weight
> >>      1                 0001         12
> >>      2                 0001          13
> >>      3                 0001           14
> >>      4                  0002         16
> >>       5                 0002         17
> >> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"))
> >> dat2<- read.table(text="
> >> ID               Height
> >> 0001            3.2
> >> 0001             2.6
> >> 0001             3.2
> >> 0002             2.2
> >> 0002              2.6
> >> ",sep="",header=TRUE,colClass=c("character","numeric"))
> >>
> dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_")))
> >>
> dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_")))
> >>  dat2
> >> #    ID Height UniqueID
> >> #1 0001    3.2   0001_1
> >> #2 0001    2.6   0001_2
> >> #3 0001    3.2   0001_3
> >> #4 0002    2.2   0002_4
> >> #5 0002    2.6   0002_5
> >> A.K.
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: Ye Lin <yelin at lbl.gov>
> >> To: R help <r-help at r-project.org>
> >> Cc:
> >> Sent: Tuesday, May 7, 2013 1:54 PM
> >> Subject: [R] create unique ID for each group
> >>
> >> Hey All,
> >>
> >> I have a dataset(dat1) like this:
> >>
> >> ObsNumber     ID          Weight
> >>      1                 0001         12
> >>      2                 0001          13
> >>      3                 0001           14
> >>      4                  0002         16
> >>       5                 0002         17
> >>
> >> And another dataset(dat2) like this:
> >>
> >> ID               Height
> >> 0001            3.2
> >> 0001             2.6
> >> 0001             3.2
> >> 0002             2.2
> >> 0002              2.6
> >>
> >> I want to merge dat1 and dat2 based on "ID" in order, I know "match"
> only
> >> returns the first match it finds. So I am thinking create unique ID col
> in
> >> dat2 and dat2, then merge. But I dont know how to do that so it can be
> like
> >> this:
> >>
> >> dat1:
> >>
> >> ObsNumber     ID          Weight  UniqueID
> >>      1                 0001         12         0001_1
> >>      2                 0001          13        0001_2
> >>      3                 0001           14       0001_3
> >>      4                  0002         16         0002_1
> >>       5                 0002         17         0002_1
> >>
> >> dat2:
> >>
> >> ID               Height   UniqueID
> >> 0001            3.2          0001_1
> >> 0001             2.6         0001_2
> >> 0001             3.2         0001_3
> >> 0002             2.2         0002_1
> >> 0002              2.6        0002_2
> >>
> >> Or if it is possible to merge dat1 and dat2 by matching "ID" but return
> the
> >> match in order that would be great!
> >>
> >> Thanks for your help!
> >>
> >>     [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
-------------- next part --------------
ID	test_ID	gene	locus	Sample_118p_0	Sample_118rp3_0	Sample_118rz_0	Sample_118z_0	Sample_132p1_0	Sample_132p2_0	Sample_132p3_0	Sample_132rp1_0	Sample_132rp3_0	Sample_132rp4_0	Sample_132rz1_0	Sample_132rz2_0	Sample_132z_0	Sample_141p1_0	Sample_141p2_0	Sample_141p3_0	Sample_141p4_0	Sample_141z_0	Sample_183p1_0	Sample_183p2_0	Sample_183p3_0	Sample_183z_0	Sample_91p_0	Sample_91rp1_0	Sample_91rp3_0	Sample_91rp4_0	Sample_91rz_0
XLOC_000009	XLOC_025681	NEFL	chr8:24808468-24814131	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
XLOC_000010	XLOC_025681	NEFL	chr8:24808468-24814131	0	0	0.29217	0.270976	0.126338	0	0	0.464747	0.596984	0.199851	0.892021	0.863341	2.91729	0	0.226087	0	0	2.1632	0.356073	0.655415	0	1.1598	0.385098	0.718336	0.187613	0.34955	0.498937
XLOC_000011	XLOC_022130	"HLA-DRB1,HLA-DRB5"	chr6:32441213-32557613	3.59279	9.09855	2.57678	1.59323	16.9363	4.47379	6.8702	6.92243	21.7622	7.46156	4.42057	3.34178	15.4373	5.21231	3.85498	2.53136	6.18972	4.83315	6.90879	12.5242	5.96035	3.40959	8.60407	15.9087	8.16287	9.35126	6.01379
XLOC_000012	XLOC_003321	CCDC3	chr10:12938624-13043704	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.581209	0.455395	0	0
XLOC_000013	XLOC_005027	CD248	chr11:66081957-66084515	0.248183	0.234721	0.145036	0.0538057	0.288489	0.120182	0.138705	0.138422	0.474156	0.297623	0.177122	0.149999	0.537889	0.0951497	0.112231	0.0610627	0.134862	0.257719	0.212109	0.325353	0.0387095	0.191911	0.229399	0.332815	0.0745058	0.225575	0.198141
XLOC_000014	XLOC_021040	STC2	chr5:172741725-172756506	0	0	0	0	0	0	0	0.0364255	0.0701849	0	0	0	0.0979922	0	0	0	0	0.101727	0	0	0	0	0	0	0	0.0410951	0.0586578
-------------- next part --------------
ID	test_ID	gene	locus	sample_1	sample_2	status	value_1	value_2	log2(fold_change)	test_stat	p_value	q_value	significant
XLOC_000009	XLOC_025681	NEFL	chr8:24808468-24814131	Sample_118p	Sample_118rp3	OK	0.14678	84.3686	9.1669	-4.83529	1.33E-06	0.0261296	yes
XLOC_000010	XLOC_025681	NEFL	chr8:24808468-24814131	Sample_118p	Sample_118z	OK	0.14678	64.1788	8.77229	-4.63808	3.52E-06	0.0401193	yes
XLOC_000011	XLOC_022130	"HLA-DRB1,HLA-DRB5"	chr6:32441213-32557613	Sample_118rz	Sample_118z	OK	3.18746	9.29E+06	21.4749	-5.75217	8.81E-09	0.00280103	yes
XLOC_000012	XLOC_003321	CCDC3	chr10:12938624-13043704	Sample_118p	Sample_132p1	OK	0.0184144	83.7839	12.1516	-4.77738	1.78E-06	0.0288706	yes
XLOC_000013	XLOC_005027	CD248	chr11:66081957-66084515	Sample_118p	Sample_132p1	OK	0.280334	216.614	9.59377	-5.10742	3.27E-07	0.0159446	yes
XLOC_000014	XLOC_021040	STC2	chr5:172741725-172756506	Sample_118p	Sample_132p1	OK	0.187273	69.3633	8.53289	-4.73246	2.22E-06	0.0320926	yes
-------------- next part --------------
ID	Sample_118p_0	Sample_118rp3_0	Sample_118rz_0	Sample_118z_0	Sample_132p1_0	Sample_132p2_0	Sample_132p3_0	Sample_132rp1_0	Sample_132rp3_0	Sample_132rp4_0	Sample_132rz1_0	Sample_132rz2_0	Sample_132z_0	Sample_141p1_0	Sample_141p2_0	Sample_141p3_0	Sample_141p4_0	Sample_141z_0	Sample_183p1_0	Sample_183p2_0	Sample_183p3_0	Sample_183z_0	Sample_91p_0	Sample_91rp1_0	Sample_91rp3_0	Sample_91rp4_0	Sample_91rz_0
XLOC_000001	112.474	166.179	81.5227	44.7787	301.154	118.827	144.47	170.407	406.899	189.131	97.1834	72.739	386.81	86.966	85.7031	53.01	158.314	145.843	219.667	240.231	127.42	78.5814	179.324	297.395	203.55	251.538	110.898
XLOC_000002	13.7609	17.7673	11.911	6.2906	39.1648	14.8832	30.0239	42.7172	88.8146	23.3105	15.4408	7.47508	40.3511	12.6166	12.7373	10.9697	28.2655	22.6594	27.2177	27.8328	18.213	7.8803	22.6769	28.9456	18.7493	22.7607	15.679
XLOC_000003	62.1301	102.162	748.313	273.52	242.685	94.2888	161.228	225.243	497.011	160.376	896.121	465.496	2330.57	72.3527	73.9626	71.3686	203.201	1048.81	172.241	183.26	98.1168	473.464	117.368	174.073	119.605	122.661	754.735
XLOC_000004	4.16261	5.71899	4.55739	2.48634	9.11917	3.49082	3.49611	4.97502	12.5986	6.38753	4.94983	4.81898	18.2275	3.22435	2.07446	1.97518	4.05074	8.86568	5.11854	6.4147	4.65076	4.37495	6.36026	9.22755	6.65625	8.8201	7.17221
XLOC_000005	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
XLOC_000006	0	0.103125	0	0	0	0.0829754	0	0	0	0	0	0	0	0	0	0	0	0	0	0.15724	0	0	0	0.11489	0.0900197	0	0
XLOC_000007	0.0282754	0.0218796	0	0	0.0385837	0	0.0129295	0.0315409	0.0303866	0	0	0	0	0	0	0	0	0	0	0.0333607	0.0396915	0	0.0392031	0	0	0	0
XLOC_000008	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
XLOC_000009	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
XLOC_000010	0	0	0.29217	0.270976	0.126338	0	0	0.464747	0.596984	0.199851	0.892021	0.863341	2.91729	0	0.226087	0	0	2.1632	0.356073	0.655415	0	1.1598	0.385098	0.718336	0.187613	0.34955	0.498937
XLOC_000011	3.59279	9.09855	2.57678	1.59323	16.9363	4.47379	6.8702	6.92243	21.7622	7.46156	4.42057	3.34178	15.4373	5.21231	3.85498	2.53136	6.18972	4.83315	6.90879	12.5242	5.96035	3.40959	8.60407	15.9087	8.16287	9.35126	6.01379
XLOC_000012	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0.581209	0.455395	0	0
XLOC_000013	0.248183	0.234721	0.145036	0.0538057	0.288489	0.120182	0.138705	0.138422	0.474156	0.297623	0.177122	0.149999	0.537889	0.0951497	0.112231	0.0610627	0.134862	0.257719	0.212109	0.325353	0.0387095	0.191911	0.229399	0.332815	0.0745058	0.225575	0.198141
XLOC_000014	0	0	0	0	0	0	0	0.0364255	0.0701849	0	0	0	0.0979922	0	0	0	0	0.101727	0	0	0	0	0	0	0	0.0410951	0.0586578