[BioC] blast probe clusters when using Affymetrix Gene Array Strips

Wed Sep 4 16:08:12 CEST 2013

Hi Joao,

On Wednesday, September 04, 2013 6:11:12 AM, Joao Sollari Lopes wrote:
> Hi,
>
> I am using Zebrafish Gene 1.1 ST Array Strip, I have found some
> transcript clusters that are differentially expressed but are not
> annotated (although they belong to the "main" design of the array). I
> would like to blast them, but I am not sure what to blast as each
> transcript cluster has various probes associated. Should I blast them
> all individually? I have read about "probe set target sequence"
> (https://stat.ethz.ch/pipermail/bioconductor/2004-March/004250.html),
> but I am not sure if it applies to the Gene Array Strip. If it does,
> how can I obtain these sequences?

Depends on what you decide to do. You can download the transcript 
clusters here:

http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.transcript_cluster.fa.zip

and then get the FASTA sequences you want to blast. This might not be 
exactly what you want, as the transcripts in that file correspond to 
very long sequences that a given probeset is designed to interrogate. 
As an example, probeset 12943944 is intended to interrogate a 2500 nt 
transcript, but uses 19 probes (25-mers) to do so. If you blast the 
transcript, you will see where that 2500 nt transcript is in the 
genome, but you won't know anything about the individual probes.

You could alternatively use the probe tab file, found here:

http://www.affymetrix.com/Auth/analysis/downloads/lf/wt/ZebGene-1_1-st-v1/ZebGene-1_1-st-v1.zv9.probe.tab.zip

and extract the 19 probes for that particular probeset and then use Jim 
Kent's blat program at the UCSC genome browser to align. I have a small 
function I have used in the past to convert these data to FASTA format 
that you can then upload to blat. But this requires the probe tab data 
to be in a probe package.

I will give you the code, but you will have to make your own probe 
package. You will need to use makeProbePackage() in the AnnotationForge 
package. There is a vignette here:

http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/makeProbePackage.pdf

as well as a help page, so you shouldn't have any problems with that.

If you decide to go that direction, here is the function you will need 
to make FASTA files:

blatGene <- function(affyid, probe, filename){
    ## affyid == Affy probeset ID
    ## probe == BioC probe package name
    ## filename == output file name
    require(probe, quietly = TRUE, character.only = TRUE)
    tmp <- data.frame(get(probe))
    if(length(affyid) > 1){
        seqnc <- vector()
        for(i in seq(along = affyid))
            seqnc <- c(seqnc, tmp[tmp$Probe.Set.Name == affyid[i], 1])
    }else{
        seqnc <- tmp[tmp$Probe.Set.Name == affyid,1]
    }
    out <- vector()
    if(length(seqnc) > 25) warning("Blat will only return values for 25 
or fewer sequences!",
                                   call. = FALSE)
    for(i in seq(along = seqnc)) out <- rbind(out, rbind(paste("> 
Probe", i, sep=""), seqnc[i]))
    write.table(out, filename, sep="\t", quote=FALSE, row.names=FALSE, 
col.names=FALSE)
}

Best,

Jim

>
> Thanks,
> Joao
> Instituto Gulbenkian de Ciencia
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099