[R] How to do the same thing for all levels of a column?

John Kane jrkrideau at inbox.com
Tue Jul 24 15:51:57 CEST 2012


First thing is to supply the data in a useable format.  As is it is essenatially unreadable.  All R-beginners do this. :)

Have a look at the dput function  (?dput) for a good way to supply sample data in an email.

If you have a large dataset probably a few dozen lines of data would be fine.

Something like dput(head(mydata)) should be fine.  Just copy and paste the output into your email.

Welcome to R.  I think you will like it.

John Kane
Kingston ON Canada


> -----Original Message-----
> From: zj29 at cornell.edu
> Sent: Mon, 23 Jul 2012 18:01:11 -0400
> To: r-help at r-project.org
> Subject: [R] How to do the same thing for all levels of a column?
> 
> Dear all,
> 
> 
> 
> I am a R beginner, and I am looking for a way to do the same thing for
> all
> levels of a column in a table.
> 
> 
> 
> Basically, I have a bunch of protein sequences composed of different
> amino
> acid residues, and each residue is represented by an uppercase letter. I
> want to calculate the ratio of different amino acid residues at each
> position of the proteins. Here is an example table:
> 
> Proteins
> 
> Time_zero
> 
> 1
> 
> 2
> 
> 3
> 
> 4
> 
> 5
> 
> 6
> 
> 7
> 
> 8
> 
> p1
> 
> 0.0050723
> 
> L
> 
> E
> 
> Y
> 
> I
> 
> I
> 
> P
> 
> D
> 
> A
> 
> p2
> 
> 0.0002731
> 
> T
> 
> E
> 
> N
> 
> L
> 
> V
> 
> P
> 
> G
> 
> A
> 
> p3
> 
> 9.757E-05
> 
> L
> 
> M
> 
> Y
> 
> Q
> 
> I
> 
> P
> 
> E
> 
> C
> 
> p4
> 
> 0.0002077
> 
> R
> 
> E
> 
> Y
> 
> L
> 
> I
> 
> S
> 
> E
> 
> A
> 
> 
> 
> If I name this table as myfile.txt, I have the following scripts to
> calculate the ratio of each amino acid residue at position 1:
> 
> # showing levels of the 3rd column, which means the types of residues
> 
> >myfile[,3]
> 
> 
> 
> # calculating the ratio of L
> 
> >list=c(which(myfile[,3]=="L"))
> 
> >time0total=sum(myfile[,2])
> 
> >AA_L=0
> 
> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
> 
> >ratio_L=AA_L/time0total
> 
> 
> 
> So how can I write a script to do the same thing for the other two levels
> (T and R) in column 3, and also do this for every column that contains
> amino acid residues?
> 
> 
> 
> Many thanks for any help you could give me on this topic! :)
> 
> 
> 
> Regards,
> 
> Zhao
> --
> Zhao JIN
> Ph.D. Candidate
> Ruth Ley Lab
> 467 Biotech
> Field of Microbiology, Cornell University
> Lab: 607.255.4954
> Cell: 412.889.3675
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!



More information about the R-help mailing list