Tensorflow tutorial for Physics Informed Neural Networks

Some notes that explain how PINNs are implemented in tensorflow

NOTE: This blog post contains some useful explanations for people who are less familiar with tensorflow, it explains few steps in a notebook of Jan Blechschmidt

PINNs intro

Physics Informed Neural Networks (PINNs) aim to solve Partial Differential Equatipons (PDEs) using neural networks. The crucial concept is to put the PDE into the loss, which is why they are referred to as physics informed many authors have used similar terms such as physics-based etc, in some cases this involves using appropriate architecture adjustments rather than penalizing loss.

The essential thing to understand is that PINNs have been designed as a tool for two types of tasks, forward and inverse problem. Forward problem refers to the solutions of Partial Differential Equation (PDE) with known functional form and inverse problems refers to finding coefficients from data. The latter application is actually more interesting, since, as it turns out, solving PDEs with PINNs (former application) is generally more efficient when using standard numerical schemes and libraries.

The method constructs a neural network approximation

\[u_\theta(t,x) \approx u(t,x)\]

of the solution of nonlinear PDE, where $u_\theta :[0,T] \times \mathcal{D} \to \mathbb{R}$ denotes a function realized by a neural network with parameters $\theta$.

The continuous time approach for the parabolic PDE as described in (Raissi et al., 2017 (Part I)) is based on the (strong) residual of a given neural network approximation $u_\theta \colon [0,T] \times \mathcal{D} \to \mathbb{R} $ of the solution $u$, i.e.,

\[\begin{align} r_\theta (t,x) := \partial_t u_\theta (t,x) + \mathcal{N}[u_\theta] (t,x). \end{align}\]

To incorporate this PDE residual $r_\theta$ into a loss function to be minimized, PINNs require a further differentiation to evaluate the differential operators $\partial_t u_\theta$ and $\mathcal{N}[u_\theta]$. Thus the PINN term $r_\theta$ shares the same parameters as the original network $u_\theta(t,x)$, but respects the underlying “physics” of the nonlinear PDE. Both types of derivatives can be easily determined through automatic differentiation with current state-of-the-art machine learning libraries, e.g., TensorFlow or PyTorch.

The PINN approach for the solution of the initial and boundary value problem now proceeds by minimization of the loss functional

\[\begin{align} \phi_\theta(X) := \phi_\theta^r(X^r) + \phi_\theta^0(X^0) + \phi_\theta^b(X^b), \end{align}\]

where $X$ denotes the collection of training data and the loss function $\phi_\theta$ contains the following terms:

the mean squared residual $\begin{align*} \phi_\theta^r(X^r) := \frac{1}{N_r}\sum_{i=1}^{N_r} \left|r_\theta\left(t_i^r, x_i^r\right)\right|^2 \end{align*}$ in a number of collocation points $X^r:=\{(t_i^r, x_i^r)\}_{i=1}^{N_r} \subset (0,T] \times \mathcal{D}$, where $r_\theta$ is the physics-informed neural network,
the mean squared error with respect to the initial and boundary conditions $\begin{align*} \phi_\theta^0(X^0) := \frac{1}{N_0} \sum_{i=1}^{N_0} \left|u_\theta\left(t_i^0, x_i^0\right) - u_0\left(x_i^0\right)\right|^2 \end{align*}$ and $\begin{align*} \phi_\theta^b(X^b) := \frac{1}{N_b} \sum_{i=1}^{N_b} \left|u_\theta\left(t_i^b, x_i^b\right) - u_b\left(t_i^b, x_i^b\right)\right|^2 \end{align*}$

in a number of points $X^0:=\{(t^0_i,x^0_i)\}_{i=1}^{N_0} \subset \{0\} \times \mathcal{D}$ and $X^b:=\{(t^b_i,x^b_i)\}_{i=1}^{N_b} \subset (0,T] \times \partial \mathcal{D}$, where $u_\theta$ is the neural network approximation of the solution $u\colon[0,T] \times \mathcal{D} \to \mathbb{R}$.

Note that the training data $X$ consists entirely of time-space coordinates.

Tensorflow implementation

I tested the notes given in notebook of Jan Blechschmidt I did not find significant increases in computation time using M1 Apple Mac book, despite lower specificaitons (on Google Colab this took noticeably longer). I am using tensorflow-macos and tensorflow-metal which is supposedly tuned for M1 processors. The task is relatively simple, so no hicaps observed during training, no heating or noise, while running in the battery mode. Below i just treat a very simple example that explains how the code works. We start by loading the required dependencies: the scientific computing library NumPy and the machine learning library TensorFlow and also work with single precision.

  import tensorflow as tf
  import numpy as np
  
  # Set data type
  DTYPE='float32'
  tf.keras.backend.set_floatx(DTYPE)

Note: Below we will give more clear explanation of how the method will work by creating simple network using just 1 hidden layer and exponential activation function to be able to easily take gradients

# Define residual of the PDE

def init_model_simple(num_hidden_layers=1, num_neurons_per_layer=2):
    # Initialize a feedforward neural network
    model = tf.keras.Sequential()

    # Input is one-dimensional (time + one spatial dimension)
    model.add(tf.keras.Input(1))

    # Append hidden layers
    for _ in range(num_hidden_layers):
        model.add(tf.keras.layers.Dense(num_neurons_per_layer,
            activation=tf.keras.activations.exponential,
            kernel_initializer='glorot_normal'))

    # Output is one-dimensional
    model.add(tf.keras.layers.Dense(1))
    
    return model

model_simple = init_model_simple()

To show how the algorithm works I only evaluate $du/dx$ and extract the weights of the relevant weights. Notice that with glorot initialization biases are zero and weights are random normal.

def get_r_simple(model, x):
    # A tf.GradientTape is used to compute derivatives in TensorFlow
    with tf.GradientTape(persistent=True) as tape:
        tape.watch(x)

        # Determine residual 
        u = model(x)

        # Compute gradient u_x within the GradientTape
        # since we need second derivatives
        u_x = tape.gradient(u, x)

    del tape

    return u_x

w21 = model_simple.weights[2].numpy()[0] # last layer
w22 = model_simple.weights[2].numpy()[1]

w11 = model_simple.weights[0].numpy()[0,0] # first layer
w12 = model_simple.weights[0].numpy()[0,1]

x = tf.constant([1], dtype=tf.float32)
print(f'du/dx = {get_r_simple(model_simple, 
  tf.constant(x, dtype=tf.float32)).numpy()} = 
  {w21*w11*np.exp(w11*x) + w22*w12*np.exp(w12)}' )
print s

Indeed the $du/dx$ is computed consistently (within single precision) both by hand and using the network (run the network to confirm)

Below we set up the loss $l := \partial_x u $ and compute its gradients (for simplicity of analytical manipulations we chose exponential activation function and just one hidden layer with two neurons)

\[\begin{aligned} u = w_{21}\,\exp(w_{11}\, x + b_{11}) + b_{21} + w_{22}\,\exp(w_{12}\, x + b_{12}) + b_{22}\\ l := \partial_x u = w_{21}\,w_{11}\,\exp(w_{11} \, x + b_{11}) + w_{22}\,w_{12}\,\exp(w_{12}\, x + b_{12}) \end{aligned}\]

We will take very simple loss as stated, which will depend only on $du/dx$

def compute_loss_simple(model, X_r):
    return get_r_simple(model, X_r)

def get_grad_simple(model, X_r):
    
    with tf.GradientTape(persistent=True) as tape:
        # This tape is for derivatives with
        # respect to trainable variables
        #tape.watch(model.trainable_variables)
        loss = compute_loss_simple(model, X_r)

    g = tape.gradient(loss, model.trainable_variables)
    del tape

    return loss, g

loss, g = get_grad_simple(model_simple, x)
for gi, varsi in zip(g, model_simple.variables):
    print(f'{varsi.name} has graidents {gi}')

This generates output which I put in the following table (your results will depend on itialization)

Layer name	neuron 1	neuron 2.
dense_2/kernel:0	4.466523	-0.13126281
dense_2/bias:0	2.0364249	-0.00236067
dense_3/kernel:0	1.9372414	0.01865212
dense_3/bias:0	1.9372414	0.01865212

It has the following expressions given that $l = w_{21}\,w_{11}\,\exp(w_{11} \, x + b_{11}) + w_{22}\,w_{12}\,\exp(w_{12}\, x + b_{12})$ $\begin{aligned} \frac{\partial \, l}{\partial {b_{11}}} = w_{21}\,w_{11}\,\exp(w_{11} \, x + b_{11}) \\ \frac{\partial \, l}{\partial {w_{12}}} = w_{22} (1+w_{12}^2)\,\exp(w_{12}\, x + b_{12}) \end{aligned}$

and others…

Note that in $\partial_x u$ there is no derivatives with respect to the biases in the last layer: $b_{21}$ and $b_{22}$, this is why the output of the list had None as the last element (because it shows gradients with respect to the bias in the last layer). We will compare the values below, to the ones evaluate by hand. You should be able to get consistent results with the lines below

  print(f"The value we expect for dl/dw11 = {w21*(1+w11**2)*np.exp(w11*x)}, dl/dw12 = {w22*(1+w12*2)*np.exp(w12*x)}")
  print(f"The value we expect for dl/db11 = {w21*w11*np.exp(w11*x)}, dl/db12 = {w22*w12*np.exp(w12*x)}")
  print(f"The value we expect for dl/dw21 = {w11*np.exp(w11*x)}, dl/dw22 = {w12*np.exp(w12*x)}")