Parameter estimation of an acoustic
differential equation by optimal control

William S. Harlan

1984, 1986, 2002

Introduction

Usually numerical methods of inversion and optimization begin with a discretized version of the simulation equations. Differential equations are often easier to manipulate directly, so that discretization can be delayed until last. Here is an example of optimally perturbing acoustic parameters to fit recorded acoustic waves. This approach is seen most often by researchers in optimal control, such as J.L. Lions [3] who also publishes often in the field of geophysics.

This example derives from an explanation I received from Patrick Lailly in 1984 at the Institut Français du Petrole. In 1986, I expanded on his example in my lab notebook. After all these years, I still find it a useful example of multi-dimensional seismic inversion, so I've finally typed it up. In a separate paper [1], I show a detailed application with the one-dimensional wave-equation.

The methods in this particular paper should be much easier to generalize to other equations and model parameters. My preferred methods of optimization (Gauss-Newton) require linearizing simulation equations and calculating the adjoint of those linearized equations. Here is how you can do most of that work analytically.

In the course of this derivation, I will derive what is called reverse-time migration, with an image directly related to perturbations of physical parameters.

Example linear equations and objective function

Let us consider the experiment of a single exploration seismic field gather with a seismic source places at a surface point ${{\svector x}_s}$ and with receivers placed at points ${{\svector x}_r}$. Each trace is recorded for a finite time $0 \leq t \leq T$. Assume that land experiments record the derivative of pressure with depth. (Marine experiments record the pressure directly, which is simpler.) Let ${c({\svector x})}$ be the acoustic velocity, and ${\rho({\svector x})}$ the density. Describe the pressure field ${P_s {({\svector x}, t)}}$ with the following two-dimensional acoustic equation.

  $\displaystyle
\left [ {1 \over {{c({\svector x})}^2 {\rho({\svector x})}}}{{\pa...
...svector x}, t)}}= {{f_s}(t)}\delta ({\svector x}-{{\svector x}_s}) \delta(t) ,
$ (1)
where the source term
  $\displaystyle
{{f_s}(t)}= {{\svector \nabla}\cdot}[ {\svector{f}(t)}/ \rho({{\svector x}_s})],
$ (2)
and ${\svector{f}(t)}$ contains body forces.

Use the free surface and causal time boundary conditions:

  $\displaystyle
\left. {P_s {({\svector x}, t)}}\right\vert _{z \leq 0} = 0 ,~
\...
... ^2}\over{\partial t^2}}{P_s {({\svector x}, t)}}\right\vert _{t \leq 0} = 0 ,
$ (3)
where $z = x_1$ is the vertical dimension of ${\svector x}$.

The surface boundary condition on $P$ is not sufficient and must be extended before $P$ is well-determined. Choose a semi-spherical surface $\partial \Omega$ that cannot be reached by waves traveling at the maximum velocity ${V_{max}}$ for the maximum recorded time $T$:

$\displaystyle \mbox{ the surface }
\partial \Omega$ $\textstyle =$ $\displaystyle \{ {\svector x}~\vert~ z = 0 ~~ \mbox{or} ~ \parallel {\svector x}\parallel
= T {V_{max}}\},$ (4)
$\displaystyle \mbox{ for the region }
\Omega$ $\textstyle =$ $\displaystyle \{ {\svector x}~\vert~ z \geq 0 ~~ \mbox{and} ~
\parallel {\svector x}\parallel \leq T {V_{max}}\}.$ (5)

The radius should be large enough so that no wave reaching this boundary $\partial \Omega$ can be recorded during the time $0 \leq t \leq T$. We can arbitrarily extend the free surface around this closed boundary without affecting the modeled pressures within the recorded time:
  $\displaystyle
\left. {P_s {({\svector x}, t)}}\right\vert _{\mathbf{x} \in \partial \Omega} = 0.
$ (6)

The variational form

The acoustic differential equation (1) can also be written in the following variational integral form, also called the weak form:
$\displaystyle {{\int_0^T}\! {\int_{\Omega}}}\left \{ \left [ {1 \over {{c({\sve...
...r x}_s}) \delta(t) \right \}
{q {({\svector x}, t)}}{~d {\svector x}\, d t}= 0,$     (7)

for all $H^2$ functions ${q {({\svector x}, t)}}$. This integral is true for all $q$ (whose second derivatives are square integrable) if and only if the original equation (1) also holds.

Conveniently, finite elements [2] also uses the weak form. Zero-value boundary conditions (3) and (4) are implied by the weak form, and are called natural boundary conditions.

The objective function

Let ${D_{rs}(t)}$ be the pressure field recorded within the volume at known receiver positions ${{\svector x}_r}$ for a source located at ${{\svector x}_s}$. Define an objective function to match a recorded pressure field ${D_{rs}(t)}$ with a modeled field ${P_s {({\svector x}, t)}}$ in a least-squares sense. The objective function is a functional of the three unknown functions of physical parameters $\rho$, $c$, and ${{f_s}(t)}$.
$\displaystyle J [ \rho, c, {f_s}]$ $\textstyle =$ $\displaystyle {1 \over 2} {\sum_{r,s}}{\int_0^T}
\left [ {D_{rs}(t)}- {1 \over ...
...}_r})}
{\partial P ({{\svector x}_r}, t ) \over \partial z}\right ]^2 {~w(t)}dt$  
  $\textstyle =$ $\displaystyle {1 \over 2} {\sum_{r,s}}{{\int_0^T}\! {\int_{\Omega}}}
\left [ {D...
...t ]^2
{\delta ({\svector x}- {{\svector x}_r})}{~d {\svector x}\, w(t) \, d t}.$ (8)

${~w(t)}$ is a smooth weighting function that we will find convenient.

The optimization problem can be stated as follows: find the model parameters $\rho$, $c$, and $s$ so that the field ${P_s {({\svector x}, t)}}$ defined by (7), (6), and (3) minimize the objective function (8). The objective function has a least-squares form. The weighting factor $w(t)$ compensates for geometric spreading and absorption to give approximately equal weight to all parts of the modeled data.

General variational and objective functions

To optimize the model parameters model parameters $\rho$, $c$, and $s$ according to equations (7), (6), and (8), one must discover how a perturbation of the model parameters will affect the objective function (8). Let us abbreviate the variational form (7) of our differential equations in the following form:

  $\displaystyle
{{\int_0^T}\! {\int_{\Omega}}}{q {({\svector x}, t)}}L_1 [ {{\sve...
...svector x}, t)}}{~d {\svector x}\, d t}= 0, ~ \forall {q {({\svector x}, t)}}.
$ (9)
  $\displaystyle
\mbox{where } {{\svector m}{({\svector x}, t)}}= [ {c({\svector x})}, {\rho({\svector x})}, {f_s}].
$ (10)
$P$ is also understood to be a implicit function of ${{\svector m}{({\svector x}, t)}}$, as determined by the acoustic equations.

The exact form of $L_1$ is unimportant for the remainder of this derivation, except that it must be expressible as a linear functional operator on $P$. $L_1$ is a nonlinear operator on ${{\svector m}{({\svector x}, t)}}$. By a different choice of model parameters ($1/c^2 \rho$ , $1/\rho$, and ${f_s}$), one could also linearize $L_1$ with respect to the model ${\svector m}$, but this is not necessary.

Similarly, we can rewrite our objective function (8) in a more general form as

  $\displaystyle
J [ {{\svector m}{({\svector x}, t)}}] = {1 \over 2} {\sum_{r,s}}...
... \}^2 {\delta ({\svector x}- {{\svector x}_r})}{~w(t)}{~d {\svector x}\, d t}.
$ (11)

Perturbations of the general formulation

First, let us use the variational equation (9) to relate perturbations of the model $\delta {\svector m}$ to perturbations of the pressure ${\delta P_s}$:
  $\displaystyle
{{\int_0^T}\! {\int_{\Omega}}}q L_1 {\delta P_s}{~d {\svector x}\...
... m}}}( L_1 P ) \cdot \delta {\svector m}{~d {\svector x}\, d t},
~ \forall q.
$ (12)
The gradient product can be understood as meaning
  $\displaystyle {{\svector \nabla}_{\mbox{\scriptsize\boldmath m}}}G({\svector m}...
...tor m})
\right\vert _{\epsilon = 0}, \mbox{ where } G({\svector m}) = L_1 P ,
$ (13)
and both $L_1$ and $P$ are functions of ${\svector m}$.

In this form the linear operators $L_1$ and ${{\svector \nabla}_{\mbox{\scriptsize\boldmath m}}}(L_1 P)$ both operate on perturbed quantities. Let us replace this form by one that uses the adjoint operators on $q$. If $a$ and $b$ are functions, the adjoint $L^*$ of a linear operator $L$ is defined by $<a \vert Lb>_a = <L^* a\vert b>_b$. The brackets indicate the scalar products in the domains of $a$ and $b$. For example

$\displaystyle < q_1 \vert q_2 >_q$ $\textstyle =$ $\displaystyle {{\int_0^T}\! {\int_{\Omega}}}q_1 {({\svector x}, t)}q_2 {({\svector x}, t)}{~d {\svector x}\, d t},$ (14)
$\displaystyle < P_1 \vert P_2 >_P$ $\textstyle =$ $\displaystyle {{\int_0^T}\! {\int_{\Omega}}}P_1 {({\svector x}, t)}P_2 {({\svector x}, t)}{~d {\svector x}\, d t}, \mbox{ and}$ (15)
$\displaystyle < m_1 \vert m_2 >_m$ $\textstyle =$ $\displaystyle {\int_{\Omega}}m_1 ({\svector x}) m_2 ({\svector x}) ~ d {\svector x}.$ (16)

So we can rewrite our perturbation equation (12) as follows:
$\displaystyle {{\int_0^T}\! {\int_{\Omega}}}{\delta P_s}L_1^* q {~d {\svector x...
...box{\scriptsize\boldmath m}}}(L_1 P)]^* q ~ dt {~d {\svector x}}, ~ \forall q .$     (17)

If $L_1$ contains only linear differential operators, then we can obtain their adjoints $L_1^*$ by integrating by parts, as I will show in a later section. This makes explicit the implicit dependence of $P$ on ${\svector m}$.

Gradient of the general objective function

Next let us relate perturbations in the objective function $J$ to perturbations in the pressure $P$ and modeling parameters used in $L_2$.
$\displaystyle - \delta J$ $\textstyle =$ $\displaystyle {\sum_{r,s}}{{\int_0^T}\! {\int_{\Omega}}}{\delta ({\svector x}- ...
...ector x}_r})}( {D_{rs}(t)}- L_2 P_s) L_2 \delta P_s ~w~ {~d {\svector x}\, d t}$  
  $\textstyle +$ $\displaystyle {\sum_{r,s}}{{\int_0^T}\! {\int_{\Omega}}}{\delta ({\svector x}- ...
...ize\boldmath m}}}(L_2 P_s) \cdot \delta {\svector m}~w~ {~d {\svector x}\, d t}$ (18)
  $\textstyle =$ $\displaystyle {\sum_{r,s}}{{\int_0^T}\! {\int_{\Omega}}}\delta P_s L_2^* [ {\de...
...ector x}- {{\svector x}_r})}( {D_{rs}(t)}- L_2 P_s) ~w] {~d {\svector x}\, d t}$  
  $\textstyle +$ $\displaystyle {\sum_{r,s}}{\int_{\Omega}}\delta {\svector m}\cdot {\int_0^T}[ {...
...svector x}- {{\svector x}_r})}( {D_{rs}(t)}- L_2 P_s) ~w] dt {~d {\svector x}}.$ (19)

Now for the clever part. The perturbed integral (17) must be true for all $q$. Choose $q$ then so that (17) equals the first term of the gradient (19) for all $\delta P$:
$\displaystyle {{\int_0^T}\! {\int_{\Omega}}}\delta P L_1^* q {~d {\svector x}\,...
...svector x}- {{\svector x}_r})}({D_{rs}(t)}- L_2P) ~w ] {~d {\svector x}\, d t}.$     (20)

For this equation to be true for all $\delta P$, $q$ must satisfy
$\displaystyle L_1^* q = {\sum_{r,s}}L_2^* [ {\delta ({\svector x}- {{\svector x}_r})}({D_{rs}(t)}- L_2 P) w] .$     (21)

This adjoint system of equations is very similar to the original set of equations for $P$, but with a new source term — the difference between the modeled and the actual data.

If $q$ satisfies this equation, then we can combine the variational equation (17) and the perturbation (19) of the objective function to remove intermediate dependence on $\delta P$.

$\displaystyle \delta J$ $\textstyle =$ $\displaystyle {\int_{\Omega}}\delta {\svector m}{\int_0^T}[ {{\svector \nabla}_{\mbox{\scriptsize\boldmath m}}}(L_1 P)]^* q ~ dt ~ d {\svector x}$  
  $\textstyle -$ $\displaystyle {\sum_{r,s}}{\int_{\Omega}}\delta {\svector m}{\int_0^T}[ {{\svec...
...\svector x}- {{\svector x}_r})}( {D_{rs}(t)}- L_2 P) w ] ~ dt ~ d {\svector x}.$ (22)

We can express this perturbation (22) as a gradient of the objective function with respect to model parameters ${\svector m}({\svector x})$ at a specific location ${\svector x}$.
$\displaystyle {{\svector \nabla}_{\mbox{\scriptsize\boldmath m}}}J [{\svector m...
...{\delta J \over \delta {\svector m}} \right \vert _{{\svector m}({\svector x})}$ $\textstyle =$ $\displaystyle {\int_0^T}[ {{\svector \nabla}_{\mbox{\scriptsize\boldmath m}}}(L_1 P)]^* q ~dt$  
  $\textstyle -$ $\displaystyle {\sum_{r,s}}{\int_0^T}[ {{\svector \nabla}_{\mbox{\scriptsize\bol...
... ]^* [ {\delta ({\svector x}- {{\svector x}_r})}( {D_{rs}(t)}- L_2 P) w ] ~dt .$ (23)

The first term of this gradient (23) is the most important because it provides information about ${\svector m}$ at all positions. The second term is non-zero only at points $\{{{\svector x}_r}\}$ where the data are recorded.

The simplest possible steepest descent optimization would perturb a reference value of ${\svector m}$ by a scaled version of its gradient. That is,

$\displaystyle \min_\alpha J [ {\svector m}_n + \alpha {{\svector \nabla}_{\mbox...
...+ \alpha {{\svector \nabla}_{\mbox{\scriptsize\boldmath m}}}J({\svector m}_n) .$     (24)

One would also sum the perturbations over all source positions to for a single perturbation of the model. After one perturbation, the new reference value of ${\svector m}_{n+1}$ could be used to calculate a new reference wavefield and a new perturbation.

Perturbation and adjoint of the acoustic equation

Let us now leave the general formulation and see how to handle the acoustic equation in particular.

Let us first perturb the objective function (8) as the general perturbed form (19):

$\displaystyle - \delta J [ \rho, c, {f_s}]$ $\textstyle =$ $\displaystyle - {\sum_{r,s}}{{\int_0^T}\! {\int_{\Omega}}}{\delta ({\svector x}...
...r{\partial z}}{\delta {P_s {({\svector x}, t)}}}{~d {\svector x}\, w(t) \, d t}$  
  $\textstyle +$ $\displaystyle {\sum_{r,s}}{{\int_0^T}\! {\int_{\Omega}}}{\delta ({\svector x}- ...
...ho({\svector x})}\over {\rho({\svector x})}^2} {~d {\svector x}\, w(t) \, d t}.$ (25)

Next we integrate by parts over $z$.
$\displaystyle - \delta J [ \rho, c, {f_s}] = {\sum_{r,s}}{{\int_0^T}\! {\int_{\...
...r x}, t)}}}\over{\partial z}}\right ] \right \} {~d {\svector x}\, w(t) \, d t}$      
$\displaystyle + {\sum_{r,s}}{{\int_0^T}\! {\int_{\Omega}}}\delta {\rho({\svecto...
...\svector x}, t)}}}\over{\partial z}}
\right \} {~d {\svector x}\, w(t) \, d t}.$     (26)

The boundary term disappears because we require $\delta {P_s {({\svector x}, t)}}$ to vanish at the boundary, as does ${P_s {({\svector x}, t)}}$.

The next step is to place the variational acoustic equation (7) as in our general perturbed form (17). First perturb:

  $\textstyle \hfil$ $\displaystyle {{\int_0^T}\! {\int_{\Omega}}}\left [ {1 \over {{c({\svector x})}...
...a}\right ) \right ] ~ {\delta {P_s {({\svector x}, t)}}}{~d {\svector x}\, d t}$  
  $\textstyle =$ $\displaystyle {{\int_0^T}\! {\int_{\Omega}}}{q {({\svector x}, t)}}\left [ {1 \...
...}}}\over{\partial t^2}}\right ] \delta {c({\svector x})}{~d {\svector x}\, d t}$  
  $\textstyle +$ $\displaystyle {{\int_0^T}\! {\int_{\Omega}}}{q {({\svector x}, t)}}\left \{ {1 ...
...r x}, t)}}\right ]
\right \} \delta {\rho({\svector x})}{~d {\svector x}\, d t}$  
  $\textstyle -$ $\displaystyle {{\int_0^T}\! {\int_{\Omega}}}{q {({\svector x}, t)}}\delta ({\svector x}) \delta {{f_s}(t)}{~d {\svector x}\, d t}.$ (27)

We can integrate the time derivatives in (27) by parts twice over time. The divergence and gradient terms of (27) can be rewritten with the following chain rule and divergence theorem:
$\displaystyle {\int_{\Omega}}a {{\svector \nabla}\cdot}{\svector b}{~d {\svector x}}$ $\textstyle =$ $\displaystyle {\int_{\Omega}}{{\svector \nabla}\cdot}(a {\svector b}) {~d {\svector x}}- {\int_{\Omega}}{\svector b}\cdot {\svector \nabla}a {~d {\svector x}}$  
  $\textstyle =$ $\displaystyle {\int_{\partial \Omega}}{\svector {\hat n} }\cdot {\svector b}\, ...
...\sigma - {\int_{\Omega}}{\svector b}\cdot {\svector \nabla}a {~d {\svector x}}.$ (28)

$a$ is a scalar and ${\svector b}$ is a vector. ${\svector {\hat n} }$ is the unit normal vector to the boundary $\partial \Omega$ of the volume $\Omega$.

So here is the rewritten form of the perturbed variational form (27):

$\displaystyle \left. {\int_{\Omega}}{q {({\svector x}, t)}}{1 \over {{c({\svect...
...r x})}}}
{\delta {P_s {({\svector x}, t)}}}{~d {\svector x}}\right \vert _{t=T}$      
$\displaystyle +{{\int_0^T}\! {\int_{\Omega}}}{\delta {P_s {({\svector x}, t)}}}...
...{\partial \over \partial n} {\delta {P_s {({\svector x}, t)}}}\, d \sigma \, dt$      
$\displaystyle - {{\int_0^T}\! {\int_{\Omega}}}{\delta {P_s {({\svector x}, t)}}...
...})}}}{\svector \nabla}{q {({\svector x}, t)}}\right ] {~d {\svector x}\, d t}
=$      
$\displaystyle {{\int_0^T}\! {\int_{\Omega}}}\delta {c({\svector x})}\left [ {1 ...
...)}}}\over{\partial t^2}}\right ] {q {({\svector x}, t)}}{~d {\svector x}\, d t}$      
$\displaystyle - {{\int_0^T}\! {\int_{\Omega}}}\delta {\rho({\svector x})}\left ...
...}\cdot {\svector \nabla}\right ] {q {({\svector x}, t)}}{~d {\svector x}\, d t}$      
$\displaystyle + {\int_0^T}\! {\int_{\partial \Omega}}\delta {\rho({\svector x})...
...{{f_s}(t)}\delta ({\svector x}) {q {({\svector x}, t)}}{~d {\svector x}\, d t}.$     (29)

The boundary terms at $t=0$ disappear because $\delta P$ must observe the boundary conditions of $P$. The $\partial P / \partial n = {\svector {\hat n} }\cdot {\svector \nabla}P$ term is the derivative in the direction of the normal to the surface. Such terms do not disappear. Other integrals that evaluate $\delta P$ on the surface $\partial \Omega$ do disappear. Now set the right side of the expanded (29) equal to the first term of of the perturbation of the objective function (26) for all perturbations $\delta P$ of the wavefield. Then
$\displaystyle \left \{ {1 \over {{c({\svector x})}^2 {\rho({\svector x})}}}{{\p...
...rho({\svector x})}}}{\svector \nabla}\right ] \right \} {q {({\svector x}, t)}}$      
$\displaystyle = - {\sum_{r,s}}{{\partial}\over{\partial z}}\left \{ {\delta ({\...
...x})}}}{{\partial {P_s {({\svector x}, t)}}}\over{\partial z}}\right ] \right \}$     (30)

in the interior. The boundary terms on $\partial \Omega$ and $t=T$ must vanish so
$\displaystyle \left. {q {({\svector x}, t)}}\right \vert _{t=T} =
\left. {{\par...
...rt _{t=T} =
\left. {q {({\svector x}, t)}}\right \vert _{\partial \Omega} = 0 .$     (31)

These adjoint equations (30) and (31) require an extrapolation on $q$ that is very similar to that on $P$, except that extrapolation must now proceed backards in time from $t=T$. A single application of this extrapolation is called reverse-time migration. Errors between the modeled and recorded data act as sources through the extrapolation at the locations of the receivers.

The first term of $\delta J$ in equation (25) now equals the right side of (29). The boundary conditions on $q$ make the surface $\partial \Omega$ integral disappear. Thus

$\displaystyle {\svector \nabla}_c J({\svector x})$ $\textstyle =$ $\displaystyle {\delta J \over \delta {c({\svector x})}} =
{\int_0^T}{1 \over {{...
...ial^2 {P_s {({\svector x}, t)}}}\over{\partial t^2}}{q {({\svector x}, t)}}dt .$ (32)
$\displaystyle {\svector \nabla}_{\rho} J({\svector x})$ $\textstyle =$ $\displaystyle {\delta J \over \delta {\rho({\svector x})}} =
{\int_0^T}\left [ ...
...s {({\svector x}, t)}}\cdot {\svector \nabla}\right ] {q {({\svector x}, t)}}dt$  
  $\textstyle \,$ $\displaystyle - {\sum_{r,s}}{\int_0^T}{{\delta ({\svector x}- {{\svector x}_r})...
...al z}}\right ]
{{\partial {P_s {({\svector x}, t)}}}\over{\partial z}}{~w(t)}dt$ (33)
$\displaystyle {\svector \nabla}_s J(t)$ $\textstyle =$ $\displaystyle {\delta J \over \delta {{f_s}(t)}} =
\left. {q {({\svector x}, t)}}\right \vert _{\mathbf{x} = \mathbf{0}}$ (34)

To perturb $c$ we take the dot product of the extrapolated $q$ with a filtered version of the previous guess of the wavefield $\frac{1}{c^3 \rho}\frac{\partial^2P}{\partial t^2}$. The perturbation of $\rho$ requires the application of a differential operator on $q$. The second term of ${\svector \nabla}_{\rho} J_1$ (from equation [26]) applies only at the receiver points and acts as an amplification of the recorded trace. The perturbation of $s$ is simply equal to the adjoint wavefield $q$ evaluated at the source position.

To start the iterations set ${{f_s}(t)}= \delta(t)$ and the parameters $c$ and $\rho$ to reasonable guesses. The first perturbation will attempt to invert the primary reflections and is equivalent to a so-called reverse-time migration. The second iteration ostensibly would invert second-order scattering, but the poor choice of background velocities would make it difficult to predict the second-order form from the first-order scattering. This inversion, like all inverse scattering methods, can only invert high-frequency parameters changes that are likely to create reflections. Smoother changes in $c$ and $\rho$ must be known beforehand.

Bibliography

1
William S. Harlan.
Separation of signal and noise applied to vertical seismic profiles.
Geophysics, 53(7):932–946, 1988.
2
Thomas J. R. Hughes.
The Finite Element Method: Linear Static and Dynamic Finite Element Analysis.
Dover Publications Inc., 2000.
3
J.L. Lions.
Optimal Control of Systems Governed by Partial Differential Equations.
Springer-Verlag, 1971.