[BioC] PWMmatch: position weight matrix or position frequency matrix?

Hervé Pagès hpages at fhcrc.org
Thu Feb 24 21:14:10 CET 2011


Hi Zuzanna,

On 02/22/2011 10:59 AM, Hervé Pagès wrote:
[...]
> Finally note that the Biostrings package doesn't provide a tool
> to convert a position frequency matrix (that can be obtained with
> consensusMatrix) into a position weight matrix.

More on this and to clarify the role of the PWM() function mentioned
by Val.

PWM() can be used on a set of short sequences to compute the associated
Position Weight Matrix using the Wasserman & Sandelin's approach.
As its name suggests, PWM() will always return a PWM, not a PFM.
The 'type' argument controls the type of Position Weight Matrix that
is returned.
The 'prior.params' argument controls the Dirichlet conjugate prior.
By this argument is set to c(A=0.25, C=0.25, G=0.25, T=0.25).

In the example given by Val, PWM(sset, type="prob") returns a PWM
that is just the PFM divided by a constant (this constant being
the number of short sequences in the input). So, in that particular
case, matchPWM() will give the same result whether you pass it the
PFM or the PWM obtained with PWM(sset , type="prob"). (Multiplying
the PWM by a constant doesn't affect the output of matchPWM).

But this is only a particular situation. It's not true in general
that PWM(  , type="prob") will return a PWM that is just the
PFM divided by a constant. For example it would not be the case
anymore if you were using a 'prior.params' vector that contains
values that are not all the same.

Internally PWM() proceeds in 2 steps:
   1. Computes the PFM of the input (i.e. of the set of short sequences).
      It uses consensusMatrix() for this.
   2. Converts this PFM into a PWM using the Wasserman & Sandelin's
      approach.

So far it was not possible for the user to use PWM() to do just 2.
I've just added this capability to the devel version of Biostrings
(version 2.19.11, will become available via biocLite() in the next
12 hours or so). So now you can do:

   library(Biostrings)
   data(HNF4alpha)
   pfm <- consensusMatrix(HNF4alpha)
   pwm <- PWM(pfm)

which is equivalent to doing PWM(HNF4alpha). This means you can
use PWM() on a PFM. You don't need to have access to the short
sequences that were used to generate this PFM anymore. Also, having
this conversion from PFM to PWM isolated allows the user to have a
closer look at it.

This is explained in the man page of the updated Biostrings package.
Hope this helps.

Cheers,
H.

 > sessionInfo()
R version 2.13.0 Under development (unstable) (2011-01-08 r53945)
Platform: i686-pc-linux-gnu (32-bit)

locale:
  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
  [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
  [7] LC_PAPER=en_US.utf8       LC_NAME=C
  [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.19.11 IRanges_1.9.25

loaded via a namespace (and not attached):
[1] Biobase_2.11.8 tools_2.13.0

>
> Hope this helps,
> H.
>
>>
>> Could somebody clarify what is the expected input for this function?
>>
>> Thanks in advance,
>>
>> Zuzanna Makowska
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list