[BioC] SRAmetadb Bioconductor package; study record count low for 2013

Jack Zhu zhujack at mail.nih.gov
Sun Jun 8 17:11:44 CEST 2014


Hi all,

Regarding missing studies by submission_date for 2013 and 2014 in the
SRAdb SQLite database, I did some investigation and found the reason.
The metadata in the SRAdb is mainly parsed from the XML files of the
SRA submissions and it is true with the submission table.  But I see
quite some submission xml files don't have submission date, e.g.

ftp://ftp-trace.ncbi.nih.gov/sra/Submissions/SRA157/SRA157949/

  SRA157949.experiment.xml
  SRA157949.submission.xml

So it seem all the study and submission records are there, but some
submission records just don't submission date.  I am looking into the
possibility of adding dates for those records.

Jamie, thanks for the finding and I will keep you updated.

Jack


On Fri, Jun 6, 2014 at 3:49 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> Hi, Jack.
>
> I took a look at this and it does appear that the number of
> submissions is very low for 2013.  Also, there are no 2014 submissions
> listed that I could find.  This was using the June 1, 2014 sqlite
> file.
>
> Sean
>
>
>
> ---------- Forwarded message ----------
> From: Al-Nasir, Jamie (2012) <Jamie.Al-Nasir.2012 at live.rhul.ac.uk>
> Date: Thu, Jun 5, 2014 at 2:20 PM
> Subject: [BioC] SRAmetadb Bioconductor package; study record count low for 2013
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Cc: "Shanahan, Hugh" <Hugh.Shanahan at rhul.ac.uk>
>
>
> Hello,
>
>
> I have been looking at the SRA (Sequence Read Archive) SQLite database
>
> provided as a Bioconductor package for R.
>
>
> My question concerns top-level studies, which are found in the study table
>
> and dated in the submissions table.
>
>
> The question is why are there so few entries for the top level studies for 2013
>
> as compared with 2011 and 2012....
>
>
> The SQL queries I have written, joining the Submission table and Study table
>
> in order to obtain the submission_date yield the following counts of top-level
>
> studies by year....
>
>
> 2005|64
> 2006|38
> 2007|94
> 2008|269
> 2009|893
> 2010|2631
> 2011|4077
> 2012|5208
> 2013|724
>
>
> As one can see the number of studies in the meta-data falls off on 2013.
>
> I have been using the sraDB bioconductor SQLite database which has
>
> the creation timestamp of 2013-12-03 08:29:26 in the metaInfo table.
>
>
> Would really appreciate if anyone has any useful thoughts on this.
>
>
> Best regards,
>
> Jamie
>
> Jamie Al-Nasir MPharm (Hons)
> Department of Computer Science
> Centre for Systems and Synthetic Biology
> Mobile: +44 (0)759 4800 229
> Web: http://jamie.al-nasir.com/
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list