[BioC] locate a target species in Refseq ftp directory

heyi xiao xiaoheyiyh at yahoo.com
Fri Oct 4 18:12:40 CEST 2013


Thanks Jim, for the hint.
That’s even worse, I will have to download and work on all files now.
Heyi

--------------------------------------------
On Fri, 10/4/13, James W. MacDonald <jmacdon at uw.edu> wrote:

 Subject: Re: [BioC] locate a target species in Refseq ftp directory

 Cc: bioconductor at r-project.org
 Date: Friday, October 4, 2013, 11:53 AM

 Hi Heyi,

 ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq-release61.txt

 And NCBI says 'Ha ha on you - it's not by species!' For
 example:

  zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \> |
 head
 >gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus
 genomic sequence, ENCODE region ENr231
 >gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis
 genomic sequence, ENCODE region ENm002
 >gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus
 genomic sequence, ENCODE region ENm014
 >gi|62903506|ref|NT_113343.1|NT_113343 Dasypus
 novemcinctus genomic sequence, ENCODE region ENr231
 >gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis
 genomic sequence, ENCODE region ENr323, part 2 of 2
 >gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii
 genomic sequence, ENCODE region ENm010
 >gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii
 genomic sequence, ENCODE region ENr322
 >gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic
 sequence, ENCODE region ENm002
 >gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic
 sequence, ENCODE region ENm003
 >gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic
 sequence, ENCODE region ENm004


 Best,

 Jim



 On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote:
 > Hi all,
 > I am trying to extract the RNA sequences for sheep (or
 Ovis aries) in Refseq ftp site. The right directory should
 be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/
 > But there so many *rna* files there, all named with
 some numbers, like vertebrate_mammalian.154.rna.fna.gz, not
 sure which one is for my target species. Readme files
 don’t really help on this. does anyone knows how to locate
 the right file for a target species there?
 > Heyi
 >
 > _______________________________________________
 > Bioconductor mailing list
 > Bioconductor at r-project.org
 > https://stat.ethz.ch/mailman/listinfo/bioconductor
 > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

 --
 James W. MacDonald, M.S.
 Biostatistician
 University of Washington
 Environmental and Occupational Health Sciences
 4225 Roosevelt Way NE, # 100
 Seattle WA 98105-6099



More information about the Bioconductor mailing list