Thursday, May 19, 2011 | By: 六便士之歌

Spectrogram review

spectrograms are basic tools for speech and audio analysis.
  • formant: formant are the resonant frequencies created within speech.
  • speech elements:
    • vowels: voiced sounds, such as a, e, etc..
    • fricatives: fricatives are the phonemes that are produced by a constriction in the vocal tract. They don't usually contain much resonant frequencies (formants), but have content across frequency spectrum.
    • plosives: plosives are transient bursts created by closure of vocal tract followed by release, which can be voiced or unvoiced.
    • diphthongs / glides: glides are characterised by spectral movement of formants over time.
    • nasals: nasals are resonant sounds produced by vibration within the nasal cavity.
  • defining these elements in a spectrogram:
    • formant: horizontal bands
    • fricatives: vertical bands of flat spectrum 'noise'
    • vowels: several formant over long period
    • glides / diphthongs: moving formant; gradual vertical movement of a formant horizontal band
    • plosives: stop in spectrum before vowels etc.
    • nasals: two or more formants usually with a fairly large gap in between where there is a missing formant; just as vowels but with 'hole in lower spectrum'
  • signal processing stages to produce a spectrogram:
    • signal segmentation and windowing
    • transformation to frequency domain via DFT with zero padding
    • Log magnitude spectrum and stacking vectors in a matrix
    • magnitude to colour mapping and display
  • four main parameters / choices in producing a spectrogram:
    • sampling rate
    • DFT length
    • segmentation window length / zero padding length
    • overlap
  • spectral resolution
    • narrowband (long window) spectrogram makes harmonic structure clear
      • bandwidth of 45~50Hz, e.g. Fs=44.1KHz, FFT size should be ~1024
      • associated with glottal source
    • wideband (short window) spectrogram makes formant structure clear
      • bandwidth of 300~500Hz, e.g. Fs=44.1KHz, FFT size should be ~128
      • dark formant bands that change with vowels not pitch
      • formants associated 'filter properties' of vocal tract above larynx

Free Template Blogger collection template Hot Deals BERITA_wongANteng SEO

0 comments:

Post a Comment

Powered by Blogger.