Alexandria Digital Research Library

Low Delay, Low Complexity Multimode Tree Coding and Practical Rate Distortion Bounds for Speech

Author:
Li, Ying-Yi
Degree Grantor:
University of California, Santa Barbara. Electrical & Computer Engineering
Degree Supervisor:
Jerry D. Gibson
Place of Publication:
[Santa Barbara, Calif.]
Publisher:
University of California, Santa Barbara
Creation Date:
2013
Issued Date:
2013
Topics:
Engineering, Electronics and Electrical
Keywords:
Multimode Coding
Backward adaptive prediction
Tree Coding
Rate Distortion Bounds
Genres:
Online resources and Dissertations, Academic
Dissertation:
Ph.D.--University of California, Santa Barbara, 2013
Description:

A low-delay and low-complexity Multimode Tree coder with perceptual pre- and post-weighting and backward pitch prediction for both narrowband and wideband speech is developed. In addition, we develop composite source models for both narrowband and wideband speech, and apply these to classical rate distortion theory. Since classical rate distortion theory is based on MSE distortion, we generate mapping functions by calculating the MSE and PESQ/WPESQ pairs from ADPCM coders. As a result, the performance of a standardized speech codec can be compared with rate distortion bounds based on PESQ/WPESQ distortion.

In our experiments, the results show that perceptual pre- and post-weighting filters and backward pitch prediction does improve speech quality without increasing bit rate and delay for voiced speech. Compared with narrowband speech codecs, the worst-case complexity of the Multimode Tree coder is one-third of AMR-NB and one-eighth of G.728, and the delay of the Multimode Tree coder is a quarter of AMR-NB. Compared with wideband standardized speech codecs, the worst-case computational complexity of the Multimode Tree coder is one-third of AMR-WB and the delay of the Multimode Tree coder is half of AMR-WB and one-third of G.722.1.

In addition, composite source models for both narrowband and wideband speech are developed. In order to generate the mapping function for MSE and PESQ/WPESQ, we use G.726/G.727 for narrowband speech mapping and generate a wideband ADPCM coder based on G.726 and G.727 for wideband mapping. The rate distortion bounds calculated from composite source models based on MSE distortion are mapped to PESQ/WPESQ distortion by mapping functions. Therefore, the performance of standardized speech codecs is compared with the rate distortion bounds based on PESQ/WPESQ distortion.

Physical Description:
1 online resource (148 pages)
Format:
Text
Collection(s):
UCSB electronic theses and dissertations
ARK:
ark:/48907/f33n21cc
ISBN:
9781303052439
Catalog System Number:
990039788070203776
Rights:
Inc.icon only.dark In Copyright
Copyright Holder:
Ying-Yi Li
Access: This item is restricted to on-campus access only. Please check our FAQs or contact UCSB Library staff if you need additional assistance.