[BioC] matching transcription factor binding sites

Hans-Ulrich Klein h.klein at uni-muenster.de
Sat Apr 12 20:30:12 CEST 2008


Hi Herve,

Herve Pages wrote:
> Hans-Ulrich Klein wrote:
>> I want to locate transcription factor binding sites (tfbs) within a 
>> given sequence. The tfbs are derived from databases like transfac or 
>> jaspar and are described by matrices. Are there algorithms for 
>> locating tfbs matches (e.g. "matinspector") implemented in 
>> bioconductor? I could not find one.
> 
> I assume that your matrices are Position Weight Matrices?

yes. I meant position weight matrices.

> There is no facility in the Biostrings package for matching PWM to
 > a DNA sequence but that would be easy to add. In fact, I've
> already fully described how to implement such facility
> in a separate package and on top of Biostrings basic containers (i.e. 
> DNAString objects) during the lab I gave for the "Advanced R
> for Bioinformatics" course back in February this year:
> 
>   http://bioconductor.org/workshops/2008
> 
> Follow "Advanced R for Bioinformatics" ->  "Interfaces to C (Lab)"
> 
> The simpleMatchPWM_0.99.0.tar.gz package contains the matchPWM() 
> function for finding all matches of a PWM in a given sequence.
> Unfortunately, the package was depending on a devel version
> of Biostrings that has changed since then, and
> those changes broke simpleMatchPWM 0.99.0. Let me know if this is what 
> you are looking for and I'll fix the package (this should
> be straightforward).

It is quite close to what I am looking for. I have access to the 
transfac database including a web based tool for finding PWM matches. I 
am looking for an alternative to the web tool in R for two reasons:

1. I have done preceding analysis in R and will do follow-up analysis in 
R. It would be nice to avoid the effort for data export and import.
2. I have not found a detailed description of the algorithm used by the 
web tool.

So simpleMatchPWM is at least a good starting point, as it does all the 
basic score computations. Why not integrate the matchPWM function in the 
Biostring package? I would appreciate it.
However, most algorithms (like MatInspector or the transfac-tool) 
implement some heuristics to improve results. E.g., they suggest 
individual cut-off values depending on the length of the pwms. I am not 
sure whether I have enough time and knowledge to add such functionalities.

Best wishes,
Hans-Ulrich

PS: Has someone experiences with the bioperl package "TFBS"?



More information about the Bioconductor mailing list