[BioC] altcdfenvs

Holger Schwender holger.schw at gmx.de
Mon Nov 8 14:58:30 CET 2004


Hi,

I have written some functions for making your own cdf environment for one of
my collegues who is interested in this. In one function, you have to input a
list. Each element of this list has to be a gene, and each of this elements
must consist of a vector of the perfect match IDs you are interested in. So
this list should look something like this

$ Gene1
[1] 12123 1412414 12231 4421233
$ Gene2
[1] 342352 12312 1234112 412211

and so on, where 12123, 1412414, ... are the PM IDs. Using this list as your
argument you will get a cdf environment that contains only the probe sets
specified in this list, and only the probe pairs of the probe sets in this
list that correspond to the PM IDs (you get both PMs and MMs corresponding
to the PM ID).

Would this function solve your problem? 

Best,
Holger

> Thanks Laurent for the tip but I encountered other problems if I create
> enough identifiers. When I created one unique identifier for each probe
pair
> I want to be inside the new cdf, then I would get 99112 probe *sets*
> because
> > length(unique(ind))
> [1] 99112
>  
> This isn't exactly what I have in mind. If say, I want to have 4 probe
> pairs (nearest to the 5-prime end) from each set, how can I proceed to
create
> this new cdf?
>  
> What I realised from what I've done below is that I will get one probe
> pair that's furthest from 5-prime end for each set because the furthest
pair
> is at the *bottom* of the probe set. The probe table is arranged in
> increasing order and so it seems to me that it updates itself and did not
keep the
> earlier ones.
>  
> Please advice and thanks for the help.
>  
> Cheers
> sw
> 
> 	-----Original Message----- 
> 	From: Laurent Gautier [mailto:lgautier at altern.org] 
> 	Sent: Mon 08-Nov-04 1:10 PM 
> 	To: Hee Siew Wan 
> 	Cc: bioconductor at stat.math.ethz.ch 
> 	Subject: Re: [BioC] altcdfenvs
> 	
> 	
> 
> 	Hee Siew Wan wrote:
> 	> Dear All
> 	> 
> 	> I was trying to use a trial data (Dilution) to create a new cdf using
> "altcdfenvs". Instead of using "matchprobes", I created the "m":
> 	
> 	...let's see how 'the "m"' was made then...
> 	
> 	> ind <- c(seq(1,199084,by=11), seq(1,199084,by=10), seq(1,199084,by=9),
> 	>  seq(1,199084,by=8), seq(1,199084,by=7), seq(1,199084,by=6))
> 	> 
> 	> m.dil <- new.env()
> 	> m.dil$match <- list(ind[1])
> 	> m.dil$match <- c(m.dil$match, ind[2:length(ind)])
> 	> m.dil <- as.list(m.dil)
> 	> length(m.dil$match)    # [1] 146637
> 	> 
> 	> id.dil <- hgu95av2probe$Probe.Set.Name[ind]
> 	> 
> 	> dil.cdf <- buildCdfEnv.matchprobes(m.dil, id.dil, nrow.chip=640,
> ncol.chip=640,
> 	>  chiptype="HG-U95Av2", probes.pack="hgu95av2probe")
> 	> 
> 	> new.dil <- Dilution[,1:2]
> 	> validAffyBatch(new.dil, dil.cdf)    # [1] TRUE
> 	> new.dil.cdfenv <- dil.cdf at envir <mailto:dil.cdf at envir>
> 	> new.dil at cdfName <mailto:new.dil at cdfName>  <- "new.dil.cdfenv"
> 	> 
> 	>
> 	>>new.dil
> 	>
> 	> AffyBatch object
> 	> size of arrays=640x640 features (6405 kb)
> 	> cdf=new.dil.cdfenv (12453 affyids)
> 	> number of samples=2
> 	> number of genes=12453
> 	> annotation=hgu95av2
> 	>
> 	>
> 	>>length(pm(new.dil[,1]))
> 	>
> 	> [1] 12453
> 	> 
> 	> As noted above, I have 12453 probe sets with my new cdf but I also have
> 12453 probe pairs when in fact I want 146637 probe pairs. The new cdf only
> returns 1 probe pair per set. Is there a way where I can have the 146637
> probe pairs?
> 	
> 	...then you may want to actually provide enough _identifiers_ (i.e.,
> 	unique strings) to achieve this. On my side, having made the variable
> 	'id.dil' the way you did, I have:
> 	 > length(unique(id.dil))
> 	[1] 12453
> 	
> 	(I did not anticipate this could be a 'gotcha'; a warning will be added
> 	to 'buildCdfEnv.matchprobes')
> 	
> 	
> 	> I tried doing the same thing for ath1121501 array. For this case, I
> created a data.frame from "ath1121501probe" with the following columns:
> 	>
> 	>>names(newath)
> 	>
> 	> [1] "sequence" "probe" "X" "Y" "position"
> 	> 
> 	> However, when I run
> 	> 
> 	> m <- matchprobes(newath$sequence, ath1121501probe$sequence)
> 	> 
> 	> I found out that for some sequences, I have more than 1 match. For
> example,
> 	> 
> 	>
> 	>>ath1121501probe$sequence[16023]
> 	>
> 	> [1] "GAGTATGCAGTCGAGTGGTGTGATG"
> 	>
> 	>>ath1121501probe$sequence[16012]
> 	>
> 	> [1] "GAGTATGCAGTCGAGTGGTGTGATG"
> 	> 
> 	> Hence, the probe that I'm interested in may not be matched to the
> correct one.
> 	
> 	
> 	...I am not certain to follow completely what you mean...
> 	
> 	
> 	> The versions I'm using:
> 	> R: 1.9.0
> 	> altcdfenvs: 1.0.0
> 	> affy: 1.4.31
> 	> ath1121501probe: 1.01
> 	
> 	You may want to upgrade to a more recent version of R and of the
> packages.
> 	
> 	
> 	
> 	Hoping it helps,
> 	
> 	
> 	L.
> 	
> 	
> 	> on Windows XP Professional Version 2002.
> 	> 
> 	> Did I do something wrong along the way for both methods? I'd appreciate
> any help or advice regarding how to get the selected probe pairs for
> analysis. Also, how do I cite the package "altcdfenvs"? Thank you.
> 	> 
> 	> Regards
> 	> Hee, Siew Wan
> 	>
> 	> _______________________________________________
> 	> Bioconductor mailing list
> 	> Bioconductor at stat.math.ethz.ch
> 	> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 	>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 

-- 
Geschenkt: 3 Monate GMX ProMail + 3 Top-Spielfilme auf DVD
++ Jetzt kostenlos testen http://www.gmx.net/de/go/mail ++



More information about the Bioconductor mailing list