Alexandria Digital Research Library

Advances in audio coding and networking by effective exploitation of long term correlations

Author:
Nanjundaswamy, Tejaswi
Degree Grantor:
University of California, Santa Barbara. Electrical & Computer Engineering
Degree Supervisor:
Kenneth Rose
Place of Publication:
[Santa Barbara, Calif.]
Publisher:
University of California, Santa Barbara
Creation Date:
2013
Issued Date:
2013
Topics:
Engineering, Electronics and Electrical
Keywords:
Perceptual optimization
Long term prediction
Audio concealment
Time varying pitch
Audio coding
Polyphonic signals
Genres:
Online resources and Dissertations, Academic
Dissertation:
Ph.D.--University of California, Santa Barbara, 2013
Description:

This dissertation focuses on tackling challenges related to efficient transmission of all varieties of audio signals over networks, which mainly are, compression at delay constraints acceptable for communication applications, and dealing with loss of content due to noisy channels. Efficiently exploiting long term correlations is key to address these challenges.

For audio compression, while there are many well known techniques that are effective in exploiting redundancies within a frame, the only solution known for inter-frame redundancy removal is the naive technique of using a simple long term prediction (LTP) filter, which provides a segment of previously reconstructed samples as prediction for the current frame. Although this technique can at least be effective for audio signals with a single stationary periodic component (i.e., monophonic), the typically employed parameter selection based on minimizing the mean squared error as opposed to the perceptual distortion criteria of audio coding, hinders the performance of LTP. This drawback is first addressed by employing a novel two-stage parameter estimation technique which jointly optimizes LTP parameters along with quantization and coding parameters, while explicitly accounting for the perceptual distortion and rate tradeoffs.

However, since most audio signals are polyphonic in nature, containing a mixture of several periodic components, the LTP tool due to its simplistic structure is well known to be ineffective. This major drawback is addressed by employing a sophisticated filter structure of cascading multiple LTP filters, each corresponding to individual periodic component. Also a recursive "divide and conquer" technique is introduced to estimate parameters of all the LTP filters in the cascade. Effectiveness of cascaded LTP for compression is demonstrated in two distinct settings of the ultra low delay Bluetooth Subband Codec and the MPEG Advanced Audio Coding (AAC) standard. In MPEG AAC, we specifically adapt the cascaded LTP parameter estimation to take into account the perceptual distortion criteria, and also propose a low decoder complexity variant.

Another shortcoming of the LTP tool used in audio coders is its subpar performance for speech and vocal content, which is well known to be quasi-periodic and involve small variations in pitch period. This drawback is addressed by employing a novel technique of introducing a single parameter of 'geometric' warping in the LTP filter, whereby past periodicity is geometrically warped to provide an adjusted prediction for the current samples. Again the parameter estimation for this modified LTP filter is adapted to take the perceptual distortion criteria into account. Objective and subjective results for all the settings validate the effectiveness of the proposals on a variety of audio signals.

For dealing with loss of content due to noisy channels, concealment techniques based on LTP filtering are well known and are suitable for audio signals with single periodic component. However, none of the existing techniques are designed to overcome the main challenge due to the polyphonic nature of most music signals. This shortcoming is addressed by employing the cascaded LTP filtering to effectively estimate every periodic component from all the available information. Objective and subjective evaluation results for the proposed approach, in comparison with existing techniques, all incorporated within an MPEG AAC low delay decoder, provide strong evidence for considerable gains across a variety of polyphonic signals.

Physical Description:
1 online resource (144 pages)
Format:
Text
Collection(s):
UCSB electronic theses and dissertations
ARK:
ark:/48907/f34747xz
ISBN:
9781303052569
Catalog System Number:
990039788190203776
Rights:
Inc.icon only.dark In Copyright
Copyright Holder:
Tejaswi Nanjundaswamy
Access: This item is restricted to on-campus access only. Please check our FAQs or contact UCSB Library staff if you need additional assistance.