[BioC] How to write "matchprobes" result(list of vector) to a table?

Lingsheng Dong dong_lsh at hotmail.com
Sat Nov 12 19:18:07 CET 2005


Dr. Gautier,

I have two questions without good answers:

In the output of the function "matchprobes", some probes match more than a 
hundred of target sequences, just as you mentioned in you paper. I guess the 
"ALU" repeats are also in there.
Based on this match result, how can I eliminate the extra probes more than 
11 for each probe set to get the Alt1 mapping in your paper? After I get 
Alt1, is there a function in the package to further remove those probes 
matching several targets? Or should I export the result to database software 
to do that and later reconstruct the "matchprobes" result object in R?

I was trying to ignore above problems and go ahead to build alt.cdf using 
the match result anyway. But Error showed up like this:

>alt.cdf <- buildCdfEnv.matchprobes(m, ids, nrow.chip = 640, ncol.chip = 
>640,
+ chiptype = "HG-U95av2", probes.pack = "hgu95av2probe")


Error in buildCdfEnv.matchprobes(m, ids, nrow.chip = 640, ncol.chip = 640,  
:
        Some elements in 'ids' are not unique. You probably do not want 
this.

I checked the "ids" and found a lot of empty elements in there. I guess the 
program treated these empties as identical. I checked the Fasta reference 
sequence file and found some sequcens without "NMxxxxxx.x"  in the header, 
but "XMxxxxxxx.x" instead. So it seems the function didn't get the RNA.IDS 
for those sequences without a "NMxxxxxx.x" ID.

get.RNA.IDs <- function(x) {
	reg <- regexpr("(Hs#|NM)[^[:blank:]|]+", x)
	r <- substr(my.entries$headers, reg, reg + attr(reg, "match.length") -1)
	return(r)
}

My question is if my observation is ture or not, and how to correct this 
error.

Thanks a lot.
Lingsheng








Lingsheng






The fear of the LORD is the beginning of wisdom, and knowledge of the Holy 
One is understanding.
--Proverbs 10:10





>From: lgautier at altern.org
>To: "Lingsheng Dong" <dong_lsh at hotmail.com>
>CC: lgautier at altern.org, bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] How to write "matchprobes" result(list of vector) to a  
>     table?
>Date: Wed, 9 Nov 2005 13:57:45 +0100 (CET)
>
> > Dr. Gautier,
> > I found there were two versions of alternative mappings in your original
> > paper. I want to know which version the "matchprobes" create the
> > alternative
> > mapping for. I guess the output contains all possible matches.
> > My plan was to write the "matchprobes" output object into a database, in
> > which each row is a match. And use MS Excel or Access to figure out the
> > logic of the function "matchpobes", then eliminate some probes if
> > necessory.
> > And I did figure out a way to do that. Here is the code: (m is the
> > "matchprobes" output object)
> > sink("probe match table.txt")
> > for(i in 1: length(m[[1]])) {
> > 	if (length(m[[1]][[i]])==0) {
> > 		print(c(i, 0))
> > 	}
> > 	else {
> > 		for ( j in 1: length(m[[1]][[i]])){
> > 			print ( c(i, m[[1]][[i]][[j]], m[[2]][[i]][[j]]))
> > 		}
> > 	}
> > }
> > sink()
> > Until now, I don't have chance to import the text file into database 
>table
> > yet.
> > If you can show me how you got the two versions of alternative and 
>explain
> > the "matchprobes" function in detail, that will be great.
>
>The data files you refer to were obtained exactly as described in the
>vignette.
>If the documentation is not sufficient, following step by step what a
>function is doing is achieved easily with the command 'debug'.
>Note that in the case of 'matchprobes' C code is called, and this will
>look like a black box until you read the source for it.
>
>
>Hoping this helps,
>
>
>L.
>
>
> > By the way, I did save the object "three times" as soon as probe 
>matching
> > done.
> > Thanks.
> >
> > Lingsheng
> >
> >
> >
> >
> >
> >
> > The fear of the LORD is the beginning of wisdom, and knowledge of the 
>Holy
> > One is understanding.
> > --Proverbs 10:10
> >
> >
> >
> >
> >
> >>From: lgautier at altern.org
> >>To: "Lingsheng Dong" <dong_lsh at hotmail.com>
> >>CC: bioconductor at stat.math.ethz.ch
> >>Subject: Re: [BioC] How to write "matchprobes" result(list of vector) to
> >> a
> >>     table?
> >>Date: Mon, 7 Nov 2005 15:36:18 +0100 (CET)
> >>
> >> > Hi, all,
> >> > I want to use most updated RefSeq to map Affymetrix probes and
> >> eliminate
> >> > cross hybredization probes as Dr. Gautier did
> >> > http://www.biomedcentral.com/1471-2105/5/111.
> >> >
> >> > Now the probe matching is finialy finished after the funtion
> >>"matchprobes"
> >> > run for 150 hours.
> >>
> >>Not completely unexpected. At the time, I remember parallelizing the job
> >>to make use of several processors...
> >>
> >> > I want export the matchprobes result to a table. I
> >> > checked the documentation, saying the result is list of vector.
> >> > Is there a specicial funtion to do it or I need to loop through the
> >> list
> >> > and
> >> > write the content to a text file?
> >>
> >>This is most likely the way to go.
> >>If you want everything in one table, you may have the need for
> >> refinements
> >>since the output list (or the two lists if 'probepos=TRUE') has one
> >>element per reference sequence in the input, but can have 0 to many
> >>matches for
> >>each one of these elements.
> >>
> >>If you are unsure, first save your list (R command 'save'), as it
> >>represents 150 hours of (computer) work.
> >>
> >>
> >>Hoping this helps,
> >>
> >>
> >>
> >>Laurent
> >>
> >>
> >>
> >> > Thanks a lot.
> >> >
> >> > Lingsheng
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > The fear of the LORD is the beginning of wisdom, and knowledge of the
> >>Holy
> >> > One is understanding.
> >> > --Proverbs 10:10
> >> >
> >> > _______________________________________________
> >> > Bioconductor mailing list
> >> > Bioconductor at stat.math.ethz.ch
> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> >
> >>
> >>
> >
> >
> >
>
>



More information about the Bioconductor mailing list