[BioC] altcdfenvs

Hee Siew Wan g0203658 at nus.edu.sg
Mon Nov 8 13:16:05 CET 2004

Thanks Laurent for the tip but I encountered other problems if I create enough identifiers. When I created one unique identifier for each probe pair I want to be inside the new cdf, then I would get 99112 probe *sets* because
> length(unique(ind))
[1] 99112
This isn't exactly what I have in mind. If say, I want to have 4 probe pairs (nearest to the 5-prime end) from each set, how can I proceed to create this new cdf?
What I realised from what I've done below is that I will get one probe pair that's furthest from 5-prime end for each set because the furthest pair is at the *bottom* of the probe set. The probe table is arranged in increasing order and so it seems to me that it updates itself and did not keep the earlier ones.
Please advice and thanks for the help.

	-----Original Message----- 
	From: Laurent Gautier [mailto:lgautier at altern.org] 
	Sent: Mon 08-Nov-04 1:10 PM 
	To: Hee Siew Wan 
	Cc: bioconductor at stat.math.ethz.ch 
	Subject: Re: [BioC] altcdfenvs

	Hee Siew Wan wrote:
	> Dear All
	> I was trying to use a trial data (Dilution) to create a new cdf using "altcdfenvs". Instead of using "matchprobes", I created the "m":
	...let's see how 'the "m"' was made then...
	> ind <- c(seq(1,199084,by=11), seq(1,199084,by=10), seq(1,199084,by=9),
	>  seq(1,199084,by=8), seq(1,199084,by=7), seq(1,199084,by=6))
	> m.dil <- new.env()
	> m.dil$match <- list(ind[1])
	> m.dil$match <- c(m.dil$match, ind[2:length(ind)])
	> m.dil <- as.list(m.dil)
	> length(m.dil$match)    # [1] 146637
	> id.dil <- hgu95av2probe$Probe.Set.Name[ind]
	> dil.cdf <- buildCdfEnv.matchprobes(m.dil, id.dil, nrow.chip=640, ncol.chip=640,
	>  chiptype="HG-U95Av2", probes.pack="hgu95av2probe")
	> new.dil <- Dilution[,1:2]
	> validAffyBatch(new.dil, dil.cdf)    # [1] TRUE
	> new.dil.cdfenv <- dil.cdf at envir <mailto:dil.cdf at envir>
	> new.dil at cdfName <mailto:new.dil at cdfName>  <- "new.dil.cdfenv"
	> AffyBatch object
	> size of arrays=640x640 features (6405 kb)
	> cdf=new.dil.cdfenv (12453 affyids)
	> number of samples=2
	> number of genes=12453
	> annotation=hgu95av2
	> [1] 12453
	> As noted above, I have 12453 probe sets with my new cdf but I also have 12453 probe pairs when in fact I want 146637 probe pairs. The new cdf only returns 1 probe pair per set. Is there a way where I can have the 146637 probe pairs?
	...then you may want to actually provide enough _identifiers_ (i.e.,
	unique strings) to achieve this. On my side, having made the variable
	'id.dil' the way you did, I have:
	 > length(unique(id.dil))
	[1] 12453
	(I did not anticipate this could be a 'gotcha'; a warning will be added
	to 'buildCdfEnv.matchprobes')
	> I tried doing the same thing for ath1121501 array. For this case, I created a data.frame from "ath1121501probe" with the following columns:
	> [1] "sequence" "probe" "X" "Y" "position"
	> However, when I run
	> m <- matchprobes(newath$sequence, ath1121501probe$sequence)
	> I found out that for some sequences, I have more than 1 match. For example,
	> Hence, the probe that I'm interested in may not be matched to the correct one.
	...I am not certain to follow completely what you mean...
	> The versions I'm using:
	> R: 1.9.0
	> altcdfenvs: 1.0.0
	> affy: 1.4.31
	> ath1121501probe: 1.01
	You may want to upgrade to a more recent version of R and of the packages.
	Hoping it helps,
	> on Windows XP Professional Version 2002.
	> Did I do something wrong along the way for both methods? I'd appreciate any help or advice regarding how to get the selected probe pairs for analysis. Also, how do I cite the package "altcdfenvs"? Thank you.
	> Regards
	> Hee, Siew Wan
	> _______________________________________________
	> Bioconductor mailing list
	> Bioconductor at stat.math.ethz.ch
	> https://stat.ethz.ch/mailman/listinfo/bioconductor

More information about the Bioconductor mailing list