We have invented a new suite of algorithms for 2-D phase unwrapping,
based on iterative probability propagation (the sum-product algorithm)
and variational techniques.
Phase unwrapping in 2-dimensional topologies
is a signal processing problem that has been
extensively studied over the past 20 years and has many important applications,
including medical imaging, radar imaging, and
The phase unwrapping problem is simply stated:
From a 2-dimensional image of scalar
values, we measure each value modulus 1. A value of 1.3 is measured
as 0.3, a value of 2.3 is measured as 0.3, etc.
(More generally, the values are measured modulus some known wavelength,
but we assume the data is normalized to this wavelength.)
Given these wrapped measurements, reconstruct the original image, taking into
account prior information such as smoothness in the unwrapped values.
The following video shows a surface being wrapped and then unwrapped using
our algorithm. Notice that the wrapped surface can
be viewed as a grayscale image, where a bright pixel corresponds to
a wrapped value near 1 and a dark pixel corresponds to
a wrapped value near 0.
click to see video (prepared by Nemanja Petrovic)
Practical applications include unwrapping MRI images, such as the MRI
image of the human head shown below, and unwrapping synthetic aperture
radar (SAR) topolographic maps, such as the map from
Sandia National Laboratories, New Mexico, shown below.
Although phase unwrapping in 1 dimension is tractable,
phase unwrapping in 2 dimensions is widely considered to be a
very difficult, perhaps unsolvable problem.
In fact, if phase unwrapping is formulated as a
``minimum L^0 norm problem'' in integer programming, it turns out to
NP-hard (Chen and Zebker, 2000).
Our research is inspired by the following conjectures:
For practical purposes, phase unwrapping is not NP-hard
and there exists an algorithm that is optimal, in the sense
of finding the most probable unwrapped image, assuming a Gaussian process
Iterative probabilistic inference in a graphical model for
the phase unwrapping problem will provide a linear-time algorithm that is
optimal, in the above sense.
The above conjectures are motivated by the recent discovery that
linear-time iterative inference in a graphical model can be used to solve the
NP-hard problem of error-correcting decoding, when the channel noise
is Gaussian (Wiberg, Loeliger and Koetter 1995;
MacKay and Neal 1996; Frey and Kschischang 1996;
Frey and MacKay 1998; McEliece, MacKay et al. 1998).
We formulate phase unwrapping as probabilistic inference in a graphical
model, such as the factor graph in (a) of the figure below. We are exploring
the use of probability propagation (b,c,d), factorized mean field methods (e)
and structured variational techniques (f) for phase unwrapping.
B. J. Frey, R. Koetter and N. Petrovic 2001
Very Loopy Belief Propagation for Unwrapping Phase Images,
Neural Information Processing Systems 14 (NIPS 01)
K. Achan, B. J. Frey, and R. Koetter 2001
A Factorized Variational Technique for Phase Unwrapping in Markov Random Fields,
In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI),
Morgan Kaufmann; Seattle, WA, 2001.
Probabilistic inference of speech signals from spectrograms
Working with a time-frequency representation of speech can have many advantages
over processing the raw amplitude samples of the signal directly. In fact,
many promising techniques operate on sequences of short-time power spectra
(spectrograms), a representation which is well-suited to many speech processing
tasks. However, a significant problem with algorithms that manipulate spectrograms
is that the output spectrogram does not include a phase component, which
is needed to create a time-domain signal that has good perceptual quality.
In this work,
we describe a generative model of time-domain speech signals and their spectrograms,
and show how an efficient optimizer can be used to find the maximum a posteriori
speech signal, given the spectrogram. In contrast to techniques that alternate
between estimating the phase and a spectrally-consistent signal, our technique
directly infers the speech signal, thus jointly optimizing the phase and a
K. Achan, S. T. Roweis, and B. J. Frey, 2003 Probabilistic
Inference of Speech Signals from Phaseless Spectrograms.in Proceedings of Neural Information Processing Systems 16 (NIPS 03)
Segmental time-domain generative model of speech
Processing of speech signals directly
in the time domain is commonly regarded to be difficult and unstable, due
to fact that perceptually very similar utterances exhibit very large variability
in their raw waveforms. Here, we propose a purely time domain approach
to speech processing in which we identify the samples at the boundaries between
glottal pulse periods (in voiced speech) or at the boundaries between unvoiced
segments of similar spectral shape.
segment boundaries are identified, a variety of important low level speech
analysis operations can be performed directly and conveniently. For example,
we can estimate the pitch as the reciprocal of the segment length. Timescale
modification without pitch or format distortion can be achieved by stochastically
eliminating or replicating segments in the time domain directly. More sophisticated
operations, such as pitch modification, gender and voice conversion, and
companding (volume equalization) are also naturally performed without the
need for a cepstral or other such representation.
In this work,
we introduce a segmental Hidden Markov Model, defined on variable length
sections of the time domain waveform, and show that performing inference
in this model allows us to identify segment boundaries and achieve excellent
results on the speech processing tasks described above.
Segmentation obtained using our approach: voicing/unvoicing decision is indicated by bars
above the signal. Segmentation is indicated by the upward arrows.
K. Achan, S. T. Roweis, A. Hertzmann, and B. J. Frey, 2004
A Segmental HMM for Speech Waveforms.Technical Report UTML-TR-2004-001, University of Toronto, (Revised May 2004).