[BioC] How to make a CDF package from an alternative CDF environment

Sat Nov 26 08:13:47 CET 2005

> Hi Lingsheng,
>
> Thanks for asking! But unfortunately I still didn't try to use those
functions, because the 'matchprobes' routine finally gave its result
only today! It took roughly 300 hours on my single-processor Mac G4 and
150 hours on double-processor Linux machine, which sounds quite
> sensible.
>
> I will start playing around with my brand new alternative CDF in the
coming days and let you know... In the meantime, I suggest you to post a
more specific question directly to the maintainer of the 'altcdfenvs'
package, i.e. Laurent Gautier <lgautier at altern.org>. He's been always
very kind in his replies...
>
> Meanwhile, I have a new question for Laurent (or any other BioC
> contributor): "How can I make a CDF package from my brand new
> alternative CDF environment?"

You may have to write your own code to do so (see second comment below).
However, you can also save your R object in a file
(command "save"), then load it and have the few
lines on page 5 of the vignette
http://www.bioconductor.org/repository/devel/vignette/altcdfenvs.pdf
whenever needed.

> I saw in the vignette of package 'makecdfenv' how to make a CDF
> environment or a CDF package starting from an Affy-provided CDF file,
but not how to convert an already existing CDF environment into a CDF
package... This would make it more easy for me to share it and also to
track important information, such as package version, genome build, etc.

In fact, hacking the function makecdfenv would not be difficult,
but then what about version number, etc... ? The current
"automagic" loading of the CDF-package (after downloading whenever
necessary) calls for a package name, and very likely will pick
the first one in the library path.

If you want to archive/share your CDF environment, you can always
attach an attribute with comments (note that the class CdfEnvAff
has already slots to help with that).

Otherwise, the help for "packages.skeleton" could give you hints on
how to proceed.

Hoping this helps,

L.

> Does somebody have any idea that would be easy to implement with the
existing packages? (For example, I hope I do not have to parse the
content of my CDF environment into a Affy-like CDF file...)
>
> Thanks in advance for any advice,
> Norman
>
> Norman Pavelka
> Department of Biotechnology and Bioscience
> University of Milano-Bicocca
> Piazza della Scienza, 2
> 20126 Milan, Italy
>
> Phone: +39 02 6448 3556
> Fax: +39 02 6448 3552
>
> On 23 Nov 2005, at 17:59, Lingsheng Dong wrote:
>
>> Hi, Morman,
>> I still have difficulties to use the functions: "countduplicated",
"removeIndex" and "unique.CdfEnvAffy". The help files are not clear
either. Could you send me your script to call these functions?
>> By the way, how are you doing with the "matchprobes" function ? Thanks.
>> Lingsheng
>> The fear of the LORD is the beginning of wisdom, and knowledge of the
Holy One is understanding.
>> --Proverbs 10:10
>>> From: Norman Pavelka <norman.pavelka at unimib.it>
>>> To: lgautier at altern.org
>>> CC: Lingsheng Dong
>>> <dong_lsh at hotmail.com>,bioconductor at stat.math.ethz.ch
>>> Subject: Re: [BioC] Small bug in function 'countskip.FASTA.entries'
from  package altcdfenvs
>>> Date: Wed, 16 Nov 2005 16:14:46 +0100
>>> Dear Laurent,
>>> On 16 Nov 2005, at 16:55, lgautier at altern.org wrote:
>>>>> Hi Lingsheng,
>>>>> On 15 Nov 2005, at 19:05, Lingsheng Dong wrote:
>>>> <snip>
>>>>>> Still another problem you may want consider:
>>>>>> The "matchprobes" function gives all possible matches. In my case, a
>>>>>> lot of probes match hundreds of target sequences. It means there will
>>>>>> be too many crossing hybredization probes if you put all probes
matching a target sequence into one probe set.
>>>>>> I couldn't find a ready to use funciton to solve this problem yet. I
>>>>>> am thinking to export the matching result into a database software and
>>>>>> manually delete crossing hybridezaiton probes.
>>>>>> Not sure if this a quick solution.
>>>>>> Hope you can give some suggetion.
>>>>> I also thought of that problem, but Laurent Gautier already gave some
>>>>> clues in his BMC Bioinformatics paper on how to handle this
>>>>> situation.
>>>>> Though I still didn't try, I guess that everything could be done very
>>>>> quickly inside R, without the need of exporting into an external
database. If you like, I can share with you my experience, as soon
as I
>>>>> have done some trials...
>>>> The functions "countduplicated", "removeIndex", and
>>>> "unique.CdfEnvAffy"
>>>> are your friends.
>>>> Hoping this helps,
>>>> Laurent
>>> Thanks for pointing to these functions! I will give a trail as soon as
the 'matchprobes' routine is over...
>>> BTW, I launched the script 150 hours ago, but it's still not
>>> finished. How much computational time should I foresee to need on my
standard Mac G4 machine (OS X Panther)? Here are some number to have
an idea: I'm remapping the MOE430 v2.0 arrays (approximately 1 million
probes) against roughly 38000 unique EnsEMBL transcripts... Thank you
in advance for your feed-back!
>>> Best,
>>> Norman
>>> Norman Pavelka
>>> Department of Biotechnology and Bioscience
>>> University of Milano-Bicocca
>>> Piazza della Scienza, 2
>>> 20126 Milan, Italy
>>> Phone: +39 02 6448 3556
>>> Fax: +39 02 6448 3552