Wednesday, May 18, 2011 | By: 六便士之歌

Automatic Speech Recognition (ASR) review

1、Problem Formulation: Frequency representations need to be invariant to pitch changes
建立一个ASR系统,必须考虑的要素有:
- timing variation
- loud / quiet speech
- speaker effects, such as gender, accents and vocal mannerisms
- contextual effects

2、speech feature extraction
① speech feature means the compact representations that can highlight distinguishing information extracted from the audio signal.
② source-filter theorysource is the excitation signal, such as oscillation of the glottis;filter is the effect of the time varying vocal tract;a speech signal can be considered as the convolution of the source and the filter.
③ a typical speech encoder: how source and filter are estimated
source can be generated using a white noise generator for an unvoiced sound or a pitch detector / plus generator for a voiced sound.filter can be estimated using an LPC filter or a suitably defined filter-bank.
④ cepstrum analysis: to separate the source and filter elements of speech using spectral methods:
     idft (log |dft (s(t))|)
⑤ LPC featuresall pole parameter estimation

3、Linguistic categories for speech recognition
-phone and phoneme
-IPA
-allophone: in phonetics, an allophone is one of a set of multiple possible spoken sounds (or phones) used to pronounce a single phoneme. For example, [pʰ] (as in pin) and [p] (as in cap) are allophones for the phoneme /p/ in the English language. Speakers treat them as the same phones, but they can be pronounced differently.

4、statistical sequence recognition: Hidden Markov Models
Free Template Blogger collection template Hot Deals BERITA_wongANteng SEO

0 comments:

Post a Comment

Powered by Blogger.