Automatic Speech Recognition (ASR) review

1、Problem Formulation: Frequency representations need to be invariant to pitch changes
建立一个ASR系统，必须考虑的要素有：
- timing variation
- loud / quiet speech
- speaker effects, such as gender, accents and vocal mannerisms
- contextual effects

2、speech feature extraction
① speech feature means the compact representations that can highlight distinguishing information extracted from the audio signal.
② source-filter theorysource is the excitation signal, such as oscillation of the glottis;filter is the effect of the time varying vocal tract;a speech signal can be considered as the convolution of the source and the filter.
③ a typical speech encoder: how source and filter are estimated
source can be generated using a white noise generator for an unvoiced sound or a pitch detector / plus generator for a voiced sound.filter can be estimated using an LPC filter or a suitably defined filter-bank.
④ cepstrum analysis: to separate the source and filter elements of speech using spectral methods:
idft (log |dft (s(t))|)
⑤ LPC featuresall pole parameter estimation

3、Linguistic categories for speech recognition
-phone and phoneme
-IPA
-allophone: in phonetics, an allophone is one of a set of multiple possible spoken sounds (or phones) used to pronounce a single phoneme. For example, [pʰ] (as in pin) and [p] (as in cap) are allophones for the phoneme /p/ in the English language. Speakers treat them as the same phones, but they can be pronounced differently.

4、statistical sequence recognition: Hidden Markov Models
Free Template Blogger collection template Hot Deals BERITA_wongANteng SEO

心喜你暖然似春

category

Labels

Popular Posts

Automatic Speech Recognition (ASR) review

0 comments:

Post a Comment

About Me

Blog Archive

Followers