[BioC] retrieving mRNA sequences via biomaRt

Simon simon212 at gmx.de
Thu Aug 6 16:35:46 CEST 2009


Hello everybody,

I am trying to solve the following tasks as a first contact with the 
bioconductor project:

# Task 1:
# find:
#   * mRNA sequence (5'UTR, Coding region, 3'UTR)
#   * position of start codon in sequence
#   * position of stop codon in sequence
#   * ID (Which ID(s) would I choose to reference my
#     sequence hits? Embl, ensembl transcript id,
#     Entrez Gene id, RefSeq, etc.?)
#   * name of associated protein product
#
#  where:
#   * origin is human
#     Entrez Search would be: human[ORGN]
#   * sequence is mRNA transcript
#     Entrez Search for Molecule Type: biomol_mRNA[PROP]?
#   * mRNA sequence length is 3000 to 5000 nts
#     * Entrez Search for Sequence Length: 3000:5000[SLEN]
#   * coding region of mRNA length is 2000 to 3000 nts
#     * Entrez Search Field for stop and start of
#       coding region: start:stop[CDS]
#
#
# Task 2:
# store the retrieved information to file for the first 200 hits
# (Which would be a suitable file formate?)

I started by using and playing around with the biomaRt package for R, 
but I got overwhelmed by its many possibilities.

I would be glad to get any feedback, on how to start or even solve my tasks.

Best regards,
Simon



More information about the Bioconductor mailing list