[BioC] Probeset/Transcript cluster definitions for HTA2.0 using pdInfoBuilder

Guilherme Rocha gvrocha at gmail.com
Fri Aug 29 14:28:47 CEST 2014


  Thank you.
  Your reply helps a lot in letting me know where to look for things. :)
  Best,
  G


On Wed, Aug 27, 2014 at 11:08 AM, James W. MacDonald <jmacdon at uw.edu> wrote:

> Hi Guilherme,
>
>
> On Tue, Aug 26, 2014 at 10:00 AM, Guilherme Rocha <gvrocha at gmail.com>
> wrote:
>
>>   Hi all,
>>
>>   I have constructed a package information file for Affy's HTA 2.0 chip
>> using pdInfoBuilder as shown below.
>>   It appears that the annotation files have been upgraded to na34 (from
>> na33 in probeFile and transFile).
>>
>>   Specific question: do the annotation files affect which probes are
>> included in each probeset/trascript cluster?
>>
>
> They can. It depends on changes between the current genome build and the
> one on which the original probeset/transcript clusters were based. Given
> the maturity of the Human Genome, I wouldn't expect massive changes.
>
>
>>   Broader question: what information from the annotation files is actually
>> used by pdInfoBuider?
>>
>
> This is something you could explore for yourself. If you go to the svn (
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks), using
> readonly for both the password and user name, and look at the source for
> pdBuilderV2HTA2.R, you can see this near the top, in the function
> parseHtaProbesetCSV():
>
>
>  cols <- c("probeset_id", "seqname", "strand", "start", "stop",
>             "transcript_cluster_id", "exon_id",
>             "crosshyb_type", "level", "probeset_type",
>             "junction_start_edge", "junction_stop_edge",
>             "junction_sequence", "has_cds")
>
> So all of this information is parsed out of the probeset CSV file. If
> there are changes to the current human genome that would imply that a
> particular probe or probeset no longer measures what Affy originally
> intended (or if the strand, start, or stop position change), then the
> changes would be reflected here, and would then be passed to the pd.hta.2.0
> package that you built.
>
> The transcript CSV file is used for much less. AFAIK, that file is just
> parsed and put into the extdata directory of the package:
>
>
> #######################################################################
>             ## Part vi) Save NetAffx Annotation to extdata
>
> #######################################################################
>             if (!quiet) message("Saving NetAffx Annotation... ",
> appendLF=FALSE)
>             netaffxProbeset <- annot2fdata(object at probeFile)
>             save(netaffxProbeset, file=file.path(extdataDir,
>                                   'netaffxProbeset.rda'), compress='xz')
>             netaffxTranscript <- annot2fdata(object at transFile)
>             save(netaffxTranscript, file=file.path(extdataDir,
>                                     'netaffxTranscript.rda'),
> compress='xz')
>
> And you can see what that looks like by doing:
>
> load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda"))
>
> and then
>
> head(pData(netaffxTranscript))
>
> but I don't think these data are currently used for anything.
>
> Best,
>
> Jim
>
>
>
>
>>
>>   Any help appreciated.
>>
>>   Thanks,
>>
>>   Guilherme Rocha
>>
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Construction fo the package:
>>
>> library(pdInfoBuilder)
>>
>> setwd("/my_bioc_packages/")
>>
>> seed <- new("AffyHTAPDInfoPkgSeed",
>>             version     = "3.8.0",
>>             license     = "Artistic-2.0",
>>             pgfFile     = ".../HTA-2_0.r1.pgf",
>>             clfFile     = ".../HTA-2_0.r1.clf",
>>             probeFile   = ".../HTA-2_0.na33.hg19.probeset.csv",
>>             transFile   = ".../HTA-2_0.na33.1.hg19.transcript.csv",
>>             coreMps     = ".../HTA-2_0.r1.Psrs.mps",
>>             geneArray   = TRUE,
>>             author      = "gvrocha",
>>             email       = "gvrocha at gmail.com",
>>             biocViews   = "AnnotationData",
>>             genomebuild = "hg19",
>>             organism    = "Homo sapiens",
>>             species     = "Homo sapien",
>>             url         = "http://about.me/gvrocha")
>>
>> makePdInfoPackage(seed, destDir=".")
>>
>>
>> --
>> Guilherme V. Rocha
>> gvrocha at gmail.com
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>



-- 
Guilherme V. Rocha
gvrocha at gmail.com

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list