[BioC] Protein/peptide mass

Thomas Girke thomas.girke at ucr.edu
Thu May 25 15:48:50 CEST 2006


John,

Here is how I usually obtain MW info for many input files using pepstats
in a shell for loop:

for i in *.fasta; do pepstats -sequence $i -stdout -auto >> pepstats; done

The argument '-stdout' turns off EMBOSS's interactive mode. 

If your peptides are in a fasta batch file then you can split them with
'seqret' using the argument '-ossingle'.

I am not sure how accurate pepstats calcultates MWs.


Thomas
	
On Thu 05/25/06 09:07, john seers (IFR) wrote:
> 
> 
> Hi Thomas
> 
> Thank you very much for your reply.
> 
> There are some functions in the packages "seqinr" and "Biostrings", in
> fact quite a lot, but not one to calculate the mass of a peptide that I
> can find. So I was being forced down the route of having to call an
> EMBOSS program and parse the results. The problem with that is the
> interface is not easy - often needs a file as input in some standard
> format - not just passing in a string on the command line.
> 
> The other way I thought might be possible was to use the online
> facilities of something like Expasy's "PeptideMass" but I cannot get
> that to work. Does anybody have any idea if that is possible?
> 
> Regards
> 
> John Seers
> 
> 
> 
> 
> 
>  
> ---
> 
> John Seers
> Institute of Food Research
> Norwich Research Park
> Colney
> Norwich
> NR4 7UA
>  
> 
> tel +44 (0)1603 251490 
> fax +44 (0)1603 255167
> e-mail john.seers at bbsrc.ac.uk                         
> e-disclaimer at http://www.ifr.ac.uk/edisclaimer/ 
>  
> Web sites:
> 
> www.ifr.ac.uk   
> www.foodandhealthnetwork.com
> 
> 
> -----Original Message-----
> From: Thomas Girke [mailto:thomas.girke at ucr.edu] 
> Sent: 24 May 2006 18:35
> To: john seers (IFR)
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Protein/peptide mass
> 
> 
> John,
> Allow me to post some comments to your question rather than providing an
> immediate 
> answer.
> 	
> On UNIX-type OSs, like Linux or MacOSX, I usually run EMBOSS 
> command-line programs directly from R using the
> systems("myemboss_program") 
> command and slurp the results into R data frames with its standard data
> import 
> functions (e.g. read.table, read.Lines). The import step often requires
> some knowledge 
> about R's regular expression utilities for reformatting the results as
> needed. 
> Knowledge about BioPerl is often very helpful as well. The advantage of
> this 
> approach is that one can post-analyze and plot almost any type of bio-
> or 
> drug-informatics program in R. However, to do this one needs to have
> some 
> basic knowledge of R, mostly for the import step of very variable data
> structures. 
> 
> For the future it would be very useful to have some BioC utilities that
> will allow
> a more user-friendly data import from EMBOSS, BLAST and hundreds of
> other 
> non-R-based bioinformatics programs.
> 
> I would be interested to know whether members on this list are working
> on packages 
> that will facilitate this integration with external sequence analysis
> tools? 
> 
> Thomas
> 
> 
> On Wed 05/24/06 16:31, john seers (IFR) wrote:
> > Hello All
> >  
> > Apologies in advance if this is an obvious question but I have
> searched
> > and cannot find an answer or a straightforward way to do it. 
> >  
> > Is there a way to calculate the mass of a protein/peptide using
> > R/Bioconductor?  i.e. like the Expasy "PeptideMass" web page or like
> the
> > EMBOSS pepstats? 
> >  
> > Regards
> >  
> > John Seers
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> > 
> > 	[[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> > 
> 
> -- 
> Thomas Girke, Ph.D.
> 1008 Noel T. Keen Hall
> Center for Plant Cell Biology (CEPCEB)
> University of California
> Riverside, CA 92521
> 
> E-mail: thomas.girke at ucr.edu
> Website: http://faculty.ucr.edu/~tgirke
> Ph: 951-827-2469
> Fax: 951-827-4437
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Thomas Girke, Ph.D.
1008 Noel T. Keen Hall
Center for Plant Cell Biology (CEPCEB)
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437



More information about the Bioconductor mailing list