Signal processing

Phase unwrapping by loopy belief propagation

Phase unwrapping in 2 dimensions

We have invented a new suite of algorithms for 2-D phase unwrapping, based on iterative probability propagation (the sum-product algorithm) and variational techniques. Phase unwrapping in 2-dimensional topologies is a signal processing problem that has been extensively studied over the past 20 years and has many important applications, including medical imaging, radar imaging, and satellite imaging.

The phase unwrapping problem is simply stated: From a 2-dimensional image of scalar values, we measure each value modulus 1. A value of 1.3 is measured as 0.3, a value of 2.3 is measured as 0.3, etc. (More generally, the values are measured modulus some known wavelength, but we assume the data is normalized to this wavelength.) Given these wrapped measurements, reconstruct the original image, taking into account prior information such as smoothness in the unwrapped values.

The following video shows a surface being wrapped and then unwrapped using our algorithm. Notice that the wrapped surface can be viewed as a grayscale image, where a bright pixel corresponds to a wrapped value near 1 and a dark pixel corresponds to a wrapped value near 0.

Sum-Product
click to see video (prepared by Nemanja Petrovic)

Practical applications include unwrapping MRI images, such as the MRI image of the human head shown below, and unwrapping synthetic aperture radar (SAR) topolographic maps, such as the map from Sandia National Laboratories, New Mexico, shown below.

MRI  Sandian 3

Although phase unwrapping in 1 dimension is tractable, phase unwrapping in 2 dimensions is widely considered to be a very difficult, perhaps unsolvable problem. In fact, if phase unwrapping is formulated as a ``minimum L^0 norm problem'' in integer programming, it turns out to NP-hard (Chen and Zebker, 2000).

Our research is inspired by the following conjectures:

  • For practical purposes, phase unwrapping is not NP-hard and there exists an algorithm that is optimal, in the sense of finding the most probable unwrapped image, assuming a Gaussian process prior.

  • Iterative probabilistic inference in a graphical model for the phase unwrapping problem will provide a linear-time algorithm that is optimal, in the above sense.

The above conjectures are motivated by the recent discovery that linear-time iterative inference in a graphical model can be used to solve the NP-hard problem of error-correcting decoding, when the channel noise is Gaussian (Wiberg, Loeliger and Koetter 1995; MacKay and Neal 1996; Frey and Kschischang 1996; Frey and MacKay 1998; McEliece, MacKay et al. 1998).

We formulate phase unwrapping as probabilistic inference in a graphical model, such as the factor graph in (a) of the figure below. We are exploring the use of probability propagation (b,c,d), factorized mean field methods (e) and structured variational techniques (f) for phase unwrapping.

graphs

References:

  • B. J. Frey, R. Koetter and N. Petrovic 2001 Very Loopy Belief Propagation for Unwrapping Phase Images, Neural Information Processing Systems 14 (NIPS 01) [PS.gz | PDF]

  • K. Achan, B. J. Frey, and R. Koetter 2001 A Factorized Variational Technique for Phase Unwrapping in Markov Random Fields, In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI), Morgan Kaufmann; Seattle, WA, 2001. [PS.gz | PDF]

Probabilistic inference of speech signals from spectrograms

Working with a time-frequency representation of speech can have many advantages over processing the raw amplitude samples of the signal directly. In fact, many promising techniques operate on sequences of short-time power spectra (spectrograms), a representation which is well-suited to many speech processing tasks. However, a significant problem with algorithms that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality.

sp1

In this work, we describe a generative model of time-domain speech signals and their spectrograms, and show how an efficient optimizer can be used to find the maximum a posteriori speech signal, given the spectrogram. In contrast to techniques that alternate between estimating the phase and a spectrally-consistent signal, our technique directly infers the speech signal, thus jointly optimizing the phase and a spectrally-consistent signal.

References:

  • K. Achan, S. T. Roweis, and B. J. Frey, 2003 Probabilistic Inference of Speech Signals from Phaseless Spectrograms. in Proceedings of Neural Information Processing Systems 16 (NIPS 03) [PS | PDF] [BibTeX]

Segmental time-domain generative model of speech

Processing of speech signals directly in the time domain is commonly regarded to be difficult and unstable, due to fact that perceptually very similar utterances exhibit very large variability in their raw waveforms. Here, we propose a purely time domain approach to speech processing in which we identify the samples at the boundaries between glottal pulse periods (in voiced speech) or at the boundaries between unvoiced segments of similar spectral shape.

seg2

Once these segment boundaries are identified, a variety of important low level speech analysis operations can be performed directly and conveniently. For example, we can estimate the pitch as the reciprocal of the segment length. Timescale modification without pitch or format distortion can be achieved by stochastically eliminating or replicating segments in the time domain directly. More sophisticated operations, such as pitch modification, gender and voice conversion, and companding (volume equalization) are also naturally performed without the need for a cepstral or other such representation.

In this work, we introduce a segmental Hidden Markov Model, defined on variable length sections of the time domain waveform, and show that performing inference in this model allows us to identify segment boundaries and achieve excellent results on the speech processing tasks described above.

seg1
Segmentation obtained using our approach: voicing/unvoicing decision is indicated by bars above the signal. Segmentation is indicated by the upward arrows.

References:

  • K. Achan, S. T. Roweis, A. Hertzmann, and B. J. Frey, 2004 A Segmental HMM for Speech Waveforms. Technical Report UTML-TR-2004-001, University of Toronto, (Revised May 2004). [PS.gz | PDF]