[BioC] locate a target species in Refseq ftp directory

heyi xiao xiaoheyiyh at yahoo.com
Fri Oct 4 23:07:52 CEST 2013


Hi Dr. Iles,
Thanks for the the link. It is very helpful!
Heyi

--------------------------------------------
On Fri, 10/4/13, David Iles <D.E.Iles at leeds.ac.uk> wrote:

 Subject: Re: [BioC] locate a target species in Refseq ftp directory

 Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
 Date: Friday, October 4, 2013, 2:59 PM

 Hi Heyi,

 You could try the following link to the sheep sequencing
 consortium web site. You'll find links to gff files there
 with known and predicted mRNAs, together with the latest
 draft assembly of the sheep genome sequence (together with
 thousands of unmapped scaffolds and contigs .....)

 http://www.livestockgenomics.csiro.au/sheep/oar3.1.php

 Hope the helps.

 Dr David Iles
 Visiting Fellow
 School of Biology
 University of Leeds
 Leeds LS2 9JT
 UK
 d.e.iles at leeds.ac.uk<mailto:d.e.iles at leeds.ac.uk>







 wrote:

 Thanks Jim, for the hint.
 That’s even worse, I will have to download and work on all
 files now.
 Heyi

 --------------------------------------------
 On Fri, 10/4/13, James W. MacDonald <jmacdon at uw.edu<mailto:jmacdon at uw.edu>>
 wrote:

 Subject: Re: [BioC] locate a target species in Refseq ftp
 directory

 Cc: bioconductor at r-project.org<mailto:bioconductor at r-project.org>
 Date: Friday, October 4, 2013, 11:53 AM

 Hi Heyi,

 ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq-release61.txt

 And NCBI says 'Ha ha on you - it's not by species!' For
 example:

  zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \>
 |
 head
 gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus
 genomic sequence, ENCODE region ENr231
 gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis
 genomic sequence, ENCODE region ENm002
 gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus
 genomic sequence, ENCODE region ENm014
 gi|62903506|ref|NT_113343.1|NT_113343 Dasypus
 novemcinctus genomic sequence, ENCODE region ENr231
 gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis
 genomic sequence, ENCODE region ENr323, part 2 of 2
 gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii
 genomic sequence, ENCODE region ENm010
 gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii
 genomic sequence, ENCODE region ENr322
 gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic
 sequence, ENCODE region ENm002
 gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic
 sequence, ENCODE region ENm003
 gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic
 sequence, ENCODE region ENm004


 Best,

 Jim



 On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote:
 Hi all,
 I am trying to extract the RNA sequences for sheep (or
 Ovis aries) in Refseq ftp site. The right directory should
 be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/
 But there so many *rna* files there, all named with
 some numbers, like vertebrate_mammalian.154.rna.fna.gz, not
 sure which one is for my target species. Readme files
 don’t really help on this. does anyone knows how to
 locate
 the right file for a target species there?
 Heyi

 _______________________________________________
 Bioconductor mailing list
 Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

 --
 James W. MacDonald, M.S.
 Biostatistician
 University of Washington
 Environmental and Occupational Health Sciences
 4225 Roosevelt Way NE, # 100
 Seattle WA 98105-6099

 _______________________________________________
 Bioconductor mailing list
 Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list