BAYESIANREGULARIZATIONOFDIFFUSIONTENSORIMAGESUSINGHIERARCHICALMCMCANDLOOPYBELIEFPROPAGATION
Siming Wei
†
Jing Hua
‡
Jiajun Bu
†
Chun Chen
†
Yizhou Yu
♦†
†
College of Computer Science, Zhejiang University
‡
Wayne State University
♦
University of Illinois at UrbanaChampaign
ABSTRACT
Based on the theory of Markov Random Fields, a Bayesian regularization model for diffusion tensor images (DTI) is proposed inthis paper. The lowdegree parameterization of diffusion tensors inour model makes it less computationally intensive to obtain a maximum
a posteriori
(MAP) estimation. An approximate solution tothe problem is achieved ef
ﬁ
ciently using hierarchical Markov ChainMonte Carlo (HMCMC), and a loopy belief propagation algorithmis applied to a coarse grid to obtain a good initial solution for hierarchical MCMC. Experiments on synthetic and real data demonstratethe effectiveness of our methods.
Index Terms
—
Diffusion Tensor Images, Image Restoration,Bayesian Models, Markov Chain Monte Carlo
1. INTRODUCTION
Diffusion tensor imaging (DTI) enables the indirect inference of white matter microstructures by reconstructing local diffusion displacement probability density functions for water molecules from local measurements. When following a 3D Gaussian distribution, thisprobability density function can be described with a diffusion tensorwhich is a
3
×
3
symmetric positive semide
ﬁ
nite matrix whose eigenvalues and eigenvectors characterize underlying
ﬁ
ber orientation andanisotropy. A diffusion tensor is inherently related to the covariancematrix of the Gaussian distribution, and can be reconstructed fromdiffusion coef
ﬁ
cients measured locally along several gradient directions. Due to inherent noise of DTI measurements, the reconstructedtensors are inaccurate, giving rise to possibly erroneous results in thederived white matter
ﬁ
ber orientation which further affects the accu
racy of
ﬁ
ber tracking. There is an extensive literature on differentregularization techniques [1, 8, 10].We would like to focus on the statistically sound Bayesianframework for tensor
ﬁ
eld regularization. Prior distributions inthe Bayesian framework are often modeled using Markov RandomFields (MRFs) with pairwise interactions. Bayesian regularizationby means of the maximum
a posteriori
(MAP) estimation is wellknown in the statistical literature, initiated in the 1980s by Gemanand Geman [4]. Bayesian regularization of the primary directions of diffusion tensors has been developed in Poupon
et al.
[10]. Similarwork for full diffusion tensors following a multivariate Gaussiandistribution was proposed in MartinFernandez
et al.
[7]. The generalization of such work to Markov random tensor
ﬁ
elds with moregeneric distributions has been presented in Frandsen
et al.
[3]. Itshould be noted that the MAP estimation in previous work wasachieved using either locally iterative optimization which can beeasily trapped in locally optimal solutions or generic Markov ChainMonte Carlo sampling [5] which needs a large number of iterationsto converge.In this paper, we introduce more advanced techniques for computing the MAP estimation of the tensor
ﬁ
eld. We
ﬁ
rst develop aBayesian model based on a lowdegree parameterization of diffusion tensors. This model makes MAP estimation less computationally intensive. We further develop a hierarchical MCMC techniquethat dynamically partitions the srcinal state space into ahierarchy of nested smaller state spaces with an increasing resolution. It is able toconverge to better approximate solutions than conventional MCMC
in a relatively small number of iterations. Loopy belief propagation(LBP)[9, 6] is applied to a coarse grid of diffusion tensors to quicklyobtain a good initial solution for hierarchical MCMC. LBP is an ef
ﬁ
cient deterministic technique for MRF based optimization. Experimental results con
ﬁ
rm that our revised Bayesian model as well asour MAP estimation techniques are both ef
ﬁ
cient and effective.
2. BASICMODEL
Let
W
be a
ﬁ
nite set of voxels in the white matter. We denote thediffusion tensor
ﬁ
eld reconstructed from measured diffusion coef
ﬁ
cients by
Σ =
{
Σ
w
:
w
∈ W}
, where
Σ
w
is a
3
×
3
positivesemide
ﬁ
nite matrix at voxel
w
. Let
λ
w
1
≥
λ
w
2
≥
λ
w
3
≥
0
bethe eigenvalues of
Σ
w
with corresponding orthonormal eigenvectors
u
w
1
,u
w
2
,
and
u
w
3
. The fractional anisotropy (FA) index is de
ﬁ
nedas:
FA
w
=
12
3
i
=1
(
λ
wi
−
¯
λ
w
)
213
3
i
=1
λ
2
wi
,
and the normalized diffusion tensor at
w
is:
¯Σ
w
= Σ
w
/
¯
λ
w
, where
¯
λ
w
=
13
(
λ
w
1
+
λ
w
2
+
λ
w
3
)
. Sincemost white matter voxels do not have neural
ﬁ
ber crossings, the regularized tensors should most likely be of “cigar” type, which means
λ
w
2
=
λ
w
3
. For this type of tensors, we de
ﬁ
ne
σ
w
=
λ
w
2
/λ
w
1
asthe
eigenratio
of
¯Σ
w
. It is obvious that there is a onetoone mapping between eigenratios and FA values, and both of their rangesare (0,1]. These two quantities are closely related since the largerthe eigenratio is, the smaller the FA value is. Moreover, a normalized cigartype tensor at
w
can be uniquely determined by the primary direction of the tensor
Md
w
=
u
w
1
and the eigenratio
σ
w
,i.e.
¯Σ
w
= ¯Σ
w
(
Md
w
,σ
w
)
. These two variables are crucial to
ﬁ
bertracking. We further de
ﬁ
ne the primary direction
ﬁ
eld as
Md
=
{
Md
w
:
w
∈W}
and eigenratio
ﬁ
eld as
σ
=
{
σ
w
:
w
∈W}
.The diffusion function at
w
is denoted as
f
w
. For a given direction
u
on the unit sphere,
f
w
(
u
) = ¯
λu
¯Σ
w
(
Md
w
,σ
w
)
u
. Let
F
=
{
f
w
:
w
∈ W}
be the
ﬁ
eld of diffusion functions. Also denote the set of measured diffusion coef
ﬁ
cients as
F
=
{
F
w
(
u
i
) :
i
= 1
,...,k,w
∈ W}
, where
u
1
,...,u
k
are the directions inwhich the signal intensity is measured.
F
is determined by the equation
S
w
(
u
i
) =
S
w
0
∗
exp(
−
bF
w
(
u
i
))
and estimated by the leastsquare approximation. Here
S
w
0
is the signal intensity without gradient,
S
w
(
u
i
)
is the measured intensity on direction
u
i
and
b
is thediffusionencoding strength factor. To reduce the noise level of a
659781424479931/10/$26.00 ©2010 IEEEICIP 2010Proceedings of 2010 IEEE 17th International Conference on Image ProcessingSeptember 2629, 2010, Hong Kong
given data set, we aim to achieve the MAP estimation
p
(
F
F
)
. According to the Bayes’ theorem, we have
p
(
F
F
)
∝
p
(
F
F
)
p
(
F
)
.Therefore, performing regularization is equivalent to solving the following optimization problem
arg max
Md, σ
p
(
F
F
)
p
(
F
)
.
(1)Our prior and likelihood models are revised versions of those inFrandsen
et. al
[3]. The prior distribution
P
(
F
)
is de
ﬁ
ned to be
p
(
F
) = 1
Z
α
exp
−
α
w
∼
w
g
(
¯Σ
w
(
Md
w
,σ
w
)
−
¯Σ
w
(
Md
w
,σ
w
)
)
,
(2)
where
Z
α
is a normalizing constant and
α >
0
.
•
representsthe Frobenius norm of a matrix, and
∼
means
w
and
w
are directneighbors. To choose an appropriate function
g
, outlier suppressionshould be taken into account. So we set
g
(
x
) =
c
−
c
∗
exp(
−
x
2
/K
)
with constant parameters
c
and
K
.A wellknown assumption isthat the raw signal intensity followsa Raciandistribution. Thus the noise on measured diffusion coef
ﬁ
cients at each voxel is independently and normally distributed([3]).The covariance of these distributions may be varying at differentvoxels. We de
ﬁ
ne such a covariance as
h
w
=
exp
(2
b
¯
λ
w
)+1(
b
∗
SNR
0
)
2
, where
SNR
0
is the signaltonoise ratio of the signal intensity without anygradient. This is a revised version of the function
h
(
f
w
(
µ
))
in [3].Theirchoice of
h
may result in the preference of spherical (isotropic)tensors during MAP estimation while ours using the average eigenvalue does not have such a bias. Therefore, at each voxel
w
anddirection
µ
, we have:
p
(
F
w
(
µ
)
F
) = 1
√
2
πh
w
exp
−
(
F
w
(
µ
)
−
f
w
(
µ
))
2
2
h
w
(3)Then the likelihood
p
(
F
F
)
is formulated as
p
(
F
F
) =
w
∈W
1
√
2
πh
w
k
exp
−
12
k
i
=1
(
F
w
(
u
i
)
−
f
w
(
u
i
))
2
h
w
,
(4)
Now we are ready to solve the optimization problem (1) following the above prior and likelihood models. Set
E
w
1
(¯Σ
w
) =
k
i
=1
ln(2
πh
w
) + (
F
w
(
u
i
)
−
¯
λu
i
¯Σ
w
u
i
)
2
h
w
,
and
E
ww
2
(¯Σ
w
,
¯Σ
w
) = 2
αg
(
¯Σ
w
−
¯Σ
w
)
.
Since
¯Σ
w
is uniquely determined by the primary direction and eigenratio,
E
w
1
is a function of both
Md
w
and
σ
w
. Similarly,
E
ww
2
is afunction of
Md
w
,
Md
w
,
σ
w
and
σ
w
. We also de
ﬁ
ne
E
Total
=
w
∈W
E
w
1
+
w
∼
w
E
ww
2
. Therefore,
E
Total
is a function of
Md
and
σ
at all voxels. The optimization problem in (1) becomes
arg min
Md, σ
E
Total
(5)
3. MULTILEVELTENSORREGULARIZATION
Markov Chain Monte Carlo has been traditionally adopted for solving the optimization problem in (4). Each step of this method needsto sample a normalized cigar tensor from a proposal distribution.Sampling positive semide
ﬁ
nite matrices in the continuous tensorspace is timeconsuming, and MCMC has a slow convergence rate.De
ﬁ
ne the resolution index
R
A
of a set
A
with norm
•
as
R
A
= min
{
a
−
a
:
a
=
a
and
a,a
∈
A
}
.
(6)We say a set is of high resolution when its resolution index is small.A higher resolution of the tensor state space gives rise to slower convergence of the Markov chain while performing MCMC sampling.Our effort is to overcome this obstacle by means of a multilevel technique for MCMC. As stated in Section 2, a diffusion tensor can beuniquely de
ﬁ
ned by its primary direction and eigenratio. The keyidea is that a tensor space at a low level is generated by a lowresolution primary direction space and a lowresolution eigenratiospace. At level
l
, we denote the state space of primary directions as
M
l
and that of eigenratios as
ER
l
. Then the optimization problemat this level is formulated as
arg min
Md, σ
E
Total
,
∀
w
∈ W
, Md
w
∈
M
lw
, σ
w
∈
ER
lw
.
(7)Our multilevel coarseto
ﬁ
ne regularization is summarized asfollows:1) At
Level
= 1
, let ICOS be an icosahedron with verticeson a unit sphere, and two of its vertices lie on the
z
axis of the coordinate system. At any voxel w,
M
lw
is set to be
{
p
:
p
is a vertex of ICOS and
z
(
p
)
>
0
}
, and
ER
lw
is de
ﬁ
ned as
{
18
,
28
,
38
,
48
,
58
,
68
,
78
}
. The resolution indices of this level are
R
M
1
w
≈
1
.
05
and
R
ER
1
w
= 0
.
125
. Perform the loopy belief propagation algorithm detailed in Section 4 to obtain a good initialprimary direction and eigenratio at every voxel.2) Level update:
Level
=
Level
+ 1
. Suppose the new level is
l
.At voxel w, suppose the current state is
(
Md
w
,σ
w
)
. De
ﬁ
ne a circleon the unit sphere to be
C
w
=
{
u
:
u
∈
unit sphere
S
2
and
u
∈
B
(
Md
w
,s
∗
R
M
l
−
1
w
)
}
,
where
B
(
Md
w
,s
∗
R
M
l
−
1
w
)
is a ball centered at
Md
w
with radius
s
∗
R
M
l
−
1
w
. Here
s
is a scaling factor. It can be proven mathematically that
s
≥√
3
/
3
is necessary.
M
lw
and
ER
lw
are de
ﬁ
ned as follows. Determine a set
PT
w
which consists of six points uniformlydistributed on the circle
C
w
.
M
lw
=
Md
w
∪
PT
w
.
ER
lw
is a setof seven points which divide
[
σ
w
−
12
R
ER
l
−
1
w
,σ
w
+
12
R
ER
l
−
1
w
]
intoeight uniform segments. Thus, the tensor state space for the currentlevel
l
is
T
l
=
{
¯Σ
w
(
Md
w
,σ
w
)

Md
w
∈
M
lw
, σ
w
∈
ER
lw
,w
∈ W}
.
3) Traverse the voxels at the current level in a sequential order. Whenupdating tensor
¯Σ
w
, randomly select a normalized diffusion tensor
¯Σ
w
in the constructed tensor space, and use it to replace
¯Σ
w
withprobability
β
, where
β
= exp(min
{
E
(¯Σ
w
)
−
E
(¯Σ
w
)
,
0
}
)
,E
(¯Σ
w
) =
E
w
1
(¯Σ
w
) +
w
∼
w
E
ww
2
(¯Σ
w
,
¯Σ
w
)
.
4) If the stopping criteria (which could be a time limit, a number of sweeps, etc.) for the current level have been satis
ﬁ
ed, goto 2) unlessthe highest level has been reached; otherwise, goto 3).As seen in the above steps, the state space at every level alwayscontains very few candidates, which accelerates the moving speedof Markov chain, and saves both time and memory. In practice, wetypically use 610 levels in the above multilevel regularization.
66
4. LOOPYBELIEFPROPAGATION
Quickly obtaining a good initial solution is of great importance tomultilevel regularization in Section 3. We chose to achieve this goalthrough belief propagation [9, 6, 2] which is widely used for MAPestimation to MRF problems. This algorithm is iterative and delivers
messages among neighboring nodes in parallel during each iteration.Each message is a vector whose dimension is equal to the numberof distinct states in a discrete state space. For a graph without anyloops, belief propagation guarantees the optimal solution in a
ﬁ
nitenumber of iterations. For a graph with loops, the same algorithm,which is now called loopy belief propagation (LBP), converges to agood approximation of the optimal solution [6].Parallel update of messages in BP leads to excessive memoryusage since all messages from one iteration need to be saved for thenext iteration. To improve the space complexity, we reduce boththe number of candidates in the state space and the number of messages. Since our prior distribution prefers similar primary directionsat neighboring voxels, we take advantage of this to establish ourcoarsened LBP model. To reduce the number of messages, we divide the voxel set into
2
×
2
×
2
blocks and only maintain messagesamong blocks. We denote the block set by
G
. For any block
g
∈ G
,
E
gG
1
=
w
∈
g
E
w
1
+
w
∼
w
w,w
∈
g
E
ww
2
, E
gg
G
2
=
w
∼
w
w
∈
gw
∈
g
E
ww
2
.
(8)It is straightforward to prove that
E
Total
=
g
∈G
E
gG
1
+
g
∼
g
E
gg
G
2
.
(9)To reduce the size of the state space, we only optimize the primarydirections of the tensors using LBP. Voxels in the same block havethe same primary direction, but they still keep their srcinal eigenratios computed from the raw data. From (9), we formulate the following new optimization problem,
arg min
Md
g
(
g
∈G
E
gG
1
(
Md
g
) +
g
∼
g
E
gg
G
2
(
Md
g
,Md
g
))
,
(10)which can be solved approximately by LBP. An additional techniquefor saving both the computation time and space cost by half is coloring the blocks with black and white. Adjacent blocks have differentcolors. The messages from black and white blocks are updated alternatively in consecutive iterations.Denote the blockwise messages by
MG
. The blockbased LBPis summarized as follows:1) Devide the dataset into
2
×
2
×
2
blocks, and color the blocks withblack and white. Initialize all messages
MG
0
gg
= 0
for
g
∼
g
.2) Update
MG
tgg
iteratively from
t
= 1
to
T
as follows. The updateis only performed on black blocks when
t
is even and white blockswhen
t
is odd.
MG
tgg
= min
Md
g
(
E
gG
1
(
Md
g
)+
E
gg
G
2
(
Md
g
,Md
g
)+
k
=
g
,k
∼
g
MG
t
−
1
kg
)
.
3) Compute the optimal state
Md
∗
g
for each block
g
:
Md
∗
g
= argmin
Md
g
{
E
gG
1
(
Md
g
) +
k
∼
g
MG
current
kg
}
,
where
MG
current
kg
is the latest version of the message propagated onthe edge between
k
and
g
.4) For each voxel
w
∈
g
, set
Md
∗
g
as its initial primary directionand choose an initial eigenratio closest to the raw tensor’s eigenratiofrom
{
18
,
28
,
38
,
48
,
58
,
68
,
78
}
. Start the multilevel MCMC described inSection 3.
5. RESULTS
Our
ﬁ
rst experiment is based on synthetic data. Figure 1 containsthree images corresponding to the initial, noisy and regularized helical tensor
ﬁ
eld whose anisotropic elements are colored in red. Figure1(b) shows the tensor
ﬁ
eld after Gaussian noise has been added ontoboth the primary direction and the smaller eigenvalue of the normalized cigartype tensors. The covariance of the noises were set to 0.05radian and 0.1, respectively. We further generated synthetic DTIs of this noisecorrupted tensor
ﬁ
eld for their use in our Bayesian regularization model. Figure 1(c) demonstrates that our regularizationmethod can remove more than 97% of the noise, and thus, is veryeffective.We have also applied our regularization algorithm on noisyreal DTI data. The resolution of the DTI is 256x256x40, and thediffusionencoding strength factor
b
= 1000
. In our regularizationmodel, we set
α
= 3
.
0
,
c
= 1
and
K
= 3
(see eq. (2)). To verifyour method’s performance, we compare the results of tractographyof wellknown
ﬁ
bers from the srcinal noisy data and the processedone using our regularization method. Figure 3(a) and (b) show the
ﬁ
ber tracts that pass through a Region of Interest (ROI) de
ﬁ
ned onthe center sagittal slice of the corpus callosum. In the regularizedDTI data, the extracted
ﬁ
ber bundles correctly pass through the corpus callosum ROI laterally making a Ushaped structure, and
ﬁ
nallyend at the cortex dorsoventral along both sides of the hemisphericalcleft, as shown in Figure 3(b). In the srcinal noisy data, the same
ﬁ
ber tracking procedure, however, largely fails in determinationof the correct tracts, as shown in Figure 3(a). Figure 3(c) and (d)show the cingulum
ﬁ
ber tracts extracted from the noisy and regularized DTI data, respectively. By de
ﬁ
ning ROIs, the cingulum
ﬁ
bertracts can be cleanly extracted from the regularized data as shownin Figure 3(d). However, in the srcinal noisy data, there are manyshort spurious
ﬁ
bers along the entire tract as shown in Figure 3(c).These comparisons clearly demonstrate the effectiveness of ourregularization method.
Fig.1
. (a) A synthetic tensor
ﬁ
eld with anisotropic elements shownin red. (b) Noisecorrupted version of the tensor
ﬁ
eld in (a). (c)Regularized version of the tensor
ﬁ
eld in (b).We further investigated the running time and convergence behavior of our regularization method on the aforementioned real DTIdata. A comparison of convergence behavior between conventionalMCMC and our Hierarchy MCMC (HMCMC) is shown in Figure 2.Their initial tensor
ﬁ
elds are installed by the same LBP algorithm.We maintain six levels in our HMCMC, and the number of sweeps ateach level is respectively set to 100,150,50,50,20,20. The curve forHMCMC shows the objective function (minus log probability den
67
0 200 400 600 800 1000 12002.442.462.482.52.522.54
seconds
m i n u s l o g d e n s i t y
MCMCHierarchy MCMC
Fig. 2
. Convergence behavior of MCMC and Hierarchy MCMC.Minus log probability density is shown as a function of running timeon an Intel Pentium D 3.0GHz processor.# LBP iterations 5 10 15 20LBP run time(s) 9 18 28 37# MCMC sweeps 20 35 41 43MCMC run time(s) 54 93 109 114saves(s) 45 75 81 77
Table 1
. Performance comparison between LBP and pure MCMCon an Intel Pentium D 3.0GHz processor.sity) drops rapidly at the
ﬁ
rst few sweeps of each level, and it eventually converges to a better approximate solution than conventionalMCMC.Table 1 justi
ﬁ
es the use of LBP to generate an initial solution.It compares the performance between LBP and pure MCMC at the
ﬁ
rst level. In pure MCMC, we initialize the tensor at each voxel
w
with the one in the discrete state space that minimizes
E
w
1
. Inthis comparison, the number of iterations (sweeps) and the time toreach the same value of the objective function are given in the samerow. We can see that running LBP 15 iterations saves more than 80seconds.
6. CONCLUSION
In this paper, we have presented a Beyesian regularization model forDTIs solved using MAP. This model introduces a lowdegree parameterization of diffusion tensors that make MAP estimation lessexpensive. The hierarchical MCMC algorithm we presented is amuch improved version of MCMC and it is able to converge to betterapproximate solutions with lower energies. We also use LBP on acoarse grid to install a good initial solution for hierarchical MCMC.Experiments demonstrated the effectiveness of our methods.
Acknowledgments
We would like to thank the reviewers for their valuable comments.This work was partially supported by National Science Foundation(IIS 0914631) and National Natural Science Foundation of China(60728204/F020404).
References
[1] A.W. Anderson. Theoretical analysis of the effects of noise ondiffusion tensor imaging.
Magnetic Resonance in Medicine
,46:1174–1188, 2001.[2] P.F. Felzenszwalb and D.P. Huttenlocher. Ef
ﬁ
cient belief prop(a) (b)(c) (d)
Fig. 3
. The results of
ﬁ
ber tracking on the srcinal noisy DTI datasetand the regularized one.(a) The tracked
ﬁ
ber tracts srcinated from aROI de
ﬁ
ned on the center image slice of the corpus callosum in thenoisy DTI dataset; (b) The tracked
ﬁ
ber tracts from the same ROI asin (a) in the regularized data; (c) The result of tracking the cingulum
ﬁ
bers in the noisy data; and (d) The result of tracking the cingulum
ﬁ
ber tracts in the regularized data.agation for early vision.
International Journal of Computer Vision
, 70(1), 2006.[3] J. Frandsen, A. Hobolth, L. Østergaard, P. VestergaardPoulsen, and E.B. Vedel Jensen. Bayesian regularization of diffusion tensor images.
Biostatistics
, 8(4):784–799, 2007.[4] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions and the bayesian restoration of images.
IEEE Transactions on Pattern Analysis and Machine Intellligence
, 6:721–741, 1984.[5] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter.
MarkovChain Monte Carlo in Practice
. Chapman & Hall/CRC, 1996.[6] Y. Weiss J.S. Yedidia, W.T. Freeman. Understanding belief propagation and its generalizations. Technical report, Mitsubishi Electric Research Laboratories, MERLTR200122,2002.[7] M.Mart´
ı
nFern´andez, C.F.Westin, and C.AlberolaL´opez. 3dbayesian regularization of diffusion tensor mri using multivariate gaussian markov random
ﬁ
elds. In
MICCAI, Lecture Notesin Computer Science
, volume 3216, pages 351–359, 2004.[8] G.J.M. Parker, J.A. Schnabel, M.R. Symms, D.J. Werring, andG.J. Barker. Nonlinear smoothing for reduction of systematicand random errorsin diffusion tensor imaging.
Journal ofMagnetic Resonance Imaging
, 11:702–710, 2000.[9] J. Pearl.
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
. Morgan Kaufmann, San Francisco, 1988.[10] C. Poupon, C.A. Clark, V. Frouin, J. Regis, I. Bloch, D. LeBihan, and J.F. Mangin. Regularization of diffusionbased directional maps for the tracking of brain white matter fascicles.
NeuroImage
, 12:184–195, 2000.
68