Semantic Morphable Model

In this chapter, we will introduce you our concept of a semantic Morphable Model. We will start from the formulation you already know from the probabilistic fitting tutorial and pose semantic Morphable Models as an extension of this framework. At the very core, it is a change of the image likelihood. This change comes with a more challenging inference problem which will be explained in detail in the next chapter.

To start let's revisit the image likelihood you already know from the probabilistic fitting tutorial:

\[ \ell \left (\theta; \tilde I \right ) = \prod_{i \in \mathcal F} \mathcal \ell_{\text{face}}( \theta ; \tilde{I}_i ) \prod_{j \in \mathcal B} \mathcal b \left( \tilde{I_i} \right ). \]

This likelihood disinguishes between pixels in the foreground \(\mathcal F\) and the background \(\mathcal B\). However this distinction is made only by where the generative face model is defined and where it is not defined. We extended this likelihood to be more flexible and to include more than two models respectively labels:

\[ \ell (\theta ; \tilde{I}, z ) = \prod_{i} \prod_{k} \ell_{k} ( \theta ; \tilde{I}_i )^{z_{ik}} \]

This likelihood supports different classes or objects \(k\). A pixel-wise segmentation-label \(z_{ik}\) defines which pixel belongs to which class. For every class, we can use a separate likelihood model \(\ell_k\). We constrained our model such that it fits our probabilistic framework, so the sum of all class labels per pixel should some up to one: \(\sum_{k} z_{ik} = 1\) \(\forall i\). And additionally we want every pixel to be fully assigned to one class only: \(z_{ik} \in\{0,1\}\). The second constraint is actually quite hard but in our experiments, we could not get reasonable results without this hard constraint. If we only partially include an occluder into face model adaptation this can already heavily mislead the fit. Compared to the formulation before, \(\theta\) can be more than only the parameters of the face model - \(\theta\) are the parameters of all parametric models involved.

This concept is quite open - basically, you can integrate whatever model can be interpreted using a likelihood function. Let's summarize the core benefits of this formulation:

In this tutorial, we will focus on a simple implementation of a semantic Morphable Model with two classes: face and non-face. The core benefit of this specific implementation is, that the face model has not to explain every pixel in the face region since every pixel in the image can be segmented as face or non-face:

If you are interested in possible extensions or the integration of more models you should have a look at our most recent work. We added a model for beards which was coupled to the face model. Both models shared the the shape and pose parameters. This is an example on how to include multiple classes and a discriminative technique based on hair detection (Egger 2017, PhD Thesis).

You could think of including more specific models for the eyes or the hair and couple those models with the face model. This enables to have multiple levels of details but still a global consistent modeling of faces in images.

For our implementation we are using a face and a non-face model - so let's have a look at the different likelihoods. We start with the likelihood for the face model - you already have seen the face model likelihood in the probabilistic fitting tutorial:

\[ \ell_{\text{face}} ( \theta; \tilde{I}_i) = \frac{1}{N} \exp \left ( - \frac{1}{2 \sigma^2} \left \lVert \tilde{I}_i - I_i(\theta) \right \rVert ^2 \right ) \]

This likelihood is however not complete for our semantic model - the main limitation is, that it is only defined in the face region \(\mathcal{F}\). There can also be pixels in the non-face region \(\mathcal{B}\) be labeled as face - for this region the classical face model is not defined. In the classical model it was evaluated only in the face region \(\mathcal{F}\), but now it will be evaluated in the whole image. To get a likelihood for the whole image we have to add a term to cope with the region \(\mathcal{B}\) where the generative face model is not active:

\[ \ell_{\text{face}} (\theta; \tilde{I}_i) = \begin{cases} \frac{1}{N} \exp \left ( - \frac{1}{2 \sigma^2} \left \lVert \tilde{I}_i - I_i(\theta) \right \rVert ^2 \right ) & \text{if $i \in \mathcal{F}$}\\ \frac{1}{\delta} h_f(\tilde{I}_i, \theta) & \text{if $i \in \mathcal{B}$}. \end{cases} \]

For pixels from the region which is not covered, we are using a likelihood based on a histogram model. Here, \(\delta\) is the bin volume. This model is similar to what we did for modeling the background in the probabilistic fitting tutorial - the only difference is, that the histogram \(h_f\) is built only on the pixels labeled as part of the face within the face region \(\mathcal F\).

Our semantic Morphable Model framework lives from different competing model. Our second model in this competition is a model for the background and the occluding regions - the non-face regions. For the non-face region we use the same likelihood as previously used for the background model, a simple color histogram estimated from the whole image:

\[\ell_{\text{non-face}} (\theta ; \tilde{I}_i) = h_{\tilde{I}}(\tilde{I}_{i})\]

This likelihood is very different from the face model likelihood. The face likelihood is based on a synthetic image by a generative model built from 3D scans - this is a quite complex likelihood. In comparison, the non-face likelihood is simply obtained by calculating a histogram and assigning the likelihood according to the relative frequency.

You could think of adding more models with different kind of likelihoods. For example, we also built a beard model which is coupled to the face model by its pose and position. The models can be of various complexity and kind. The framework is open to discriminative and generative models and we proposed a way to combine them within the Markov random field segmentation (Egger 2017)

The semantic Morphable Model is a combination of the segmentation which we formulated as \(P(z|\tilde{I}, \theta)\) and the estimation of the parameters of all models \(\ell (\theta; \tilde{I}, z )\). You can already see, that the segmentation label \(z\) is depending on the model parameters \(\theta\) and vice-versa. In the next Chapter, we will discuss how to solve this inference problem.


Semantic Morphable Models Tutorial | Semantic Morphable Model | next