[R] Prosodic/phonetic analysis with R

Sun Dec 26 15:02:13 CET 2004

On 26-Dec-04 Uwe Ligges wrote:
> (Ted Harding) wrote:
>> So I would like to ask R people for their recommendations
>> for a program which would
>> 
>> a) Take as input a sound file in one of the common formats
>>    (".wav", ".au")
> 
> Ted,
> 
> see package tuneR for reading Wave files.
> 
> 
>> b) perform at least basic phonetic analysis (formants, F0,
>>    spectrograms, ... )
> 
> For F0 and spectograms see also tuneR.

Thanks for the pointer to tuneR, Uwe. I've had a look at the
reference manual, and it does seem to be primarily oriented
towards analysis of musical data. I'm not so much interested
in getting the raw sound file into R and then doing basic
frequency-type analysis on this, as in working on the output
of a program which can apply phonetic expertise to the file
and then present the characteristics of the phonetic analysis
to R for further analysis.

> "Formants" is a bit more tricky. We tried some analyses, but
> since the definition of a formant is still not completly clear
> to me, we haven't provided anything for formants in the package
> yet.
> 
> Do you know some good literature that gives a somwhat precise
> definition? At least musicias only talk about something like
> "raised" areas in the periodogram, which is not very helpful
> given the missing definition of "raised".

Well, I'm only a beginner! I could agree with your summary from
what I have read so far. The account I have seen so far which
best combines general accessibility with apparent technical
throughness is the on-line Britannica article "Phonetics":

http://www.britannica.com/eb/print?tocId=9108587&fullArticle=true

and the following is a relevant quote:

   In summary, speech sounds are fairly well defined by nine
   acoustic factors. The first three factors include the
   frequencies of the first three formants; these are responsible
   for the major part of the information in speech. Characterizing
   the vocal tract shape, these formant frequencies specify vowels,
   nasals, laterals, and the transitional movements in voiced
   consonants. The frequencies of the fourth and higher formants
   do not vary significantly. The fourth factor is the fundamental
   frequency--roughly speaking, the pitch--of the larynx pulse in
   voiced sounds, and the fifth, the amplitude--roughly speaking,
   the loudness--of the larynx pulse. These last two factors
   account for suprasegmental information; e.g., variations in
   stress and intonation. They also distinguish between voiced
   and voiceless sounds, in that the latter have no larynx pulse
   amplitude. The centre frequency of the high-frequency hissing
   noises in voiceless sounds constitutes the sixth acoustic factor,
   and the seventh is the amplitude of these high-frequency noises.
   These two factors characterize the major differences among
   voiceless sounds. In more accurate descriptions it would be
   necessary to specify more than just the centre frequency of
   the noise in fricative sounds. The eighth and ninth factors
   include the amplitudes of the second and third formants relative
   to the first formant; the amplitudes of the formants as a whole
   are determined by the larynx pulse amplitude. These latter
   factors are the least important in that they convey only
   supplementary information about nasals and laterals.

Earlier in the article it is stated that

  "The resonant frequencies of the vocal tract are known as
   the formants."

but one has to read through the whole thing before the richer
implications of this start to become apparent.

The advantage of software like 'praat' is that phonetic experts
have incorporated their understanding -- much clearer than I'm
likely to achieve from the above -- into the software!

I'm also grateful to Shravan Vasishth for responding with the
suggestion of EMU. This seems at first sight to be less sophisticated
than 'praat', though with what looks like a useful repertoire
of "primitives" -- from its description:

  "EMU is a collection of software tools for the creation,
   manipulation and analysis of speech databases. At the core
   of EMU is a database search engine which allows queries based
   on the sequential and hierarchical structure of the annotations."

It has the immediate advantage that it comes with facilities
for direct linkage to S-Plus and R. Clearly worth looking into,
but I don't know yet whether it would do enough of the dirty
work for me!

Thanks, Uwe and Shravan! If I get anywhere useful, I'll report
back to the list.

All best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 26-Dec-04                                       Time: 14:02:13
------------------------------ XFMail ------------------------------