Image segmentation is one of the core topics of computer vision. The basic idea of segmentation is to divide an image into different regions or segments. Most often we search for a meaningful segmentation where those regions represent an object or part of an object.
The main application of segmentation is the detection of objects and their boundaries. However, it is often only applied as a preprocessing step - it is usually much easier to work on a segmented image or its boundary information than on the full image.
The idea of semantic Morphable Models is actually strongly linked to segmentation. We would like to have various kinds of models to explain the pixels of an image. The assignment of the pixels to the models is exactly a segmentation task - for every pixel, we have to find a label to distinguish which model rules this pixel. Classical segmentation techniques are helpful for this task - they usually use simpler models (top-down) but strong pixel dependency or edge information (bottom-up). Explicit segmentation using bottom-up information is, therefore, a good companion for image analysis with generative models.
Since segmentation is widely used there are various approaches - in this tutorial we will focus on an approach based on a Markov random field which is fully probabilistic, general and widely applied. If you would like to have a broader overview over segmentation techniques we recommend to have a look at the Lecture Slides by Philipp Kraehenbuehl (http://vision.stanford.edu/teaching/cs231b_spring1213/slides/segmentation.pdf) - there are also some novel techniques based on deep learning.
The task of segmentation is a pixel labeling task - we have a hidden label \(z\) per observed pixel \(i\) which defines the class \(k\) of that pixel. Between the hidden labels, we assume neighborhood relations. The Markov random field is defined as:
\[ P(z|\tilde{I}, \theta) \propto \prod_i \prod_k \ell_{k}( \theta ; \tilde{I}_i )^{z_{ik}} P(z|\theta,c) \prod_{j \in n(i)} P(z_{ik}, z_{jk}) \]
The first term is the data term, respectively the class likelihood \(\ell_k\) - it reflects how well a pixel is explained by each class \(k\). We incorporate a Prior \(P(z|\theta,c)\) which can be used for different labels. The second term \(P(z_{ik}, z_{jk})\) is a smoothness assumption - we prefer if neighboring pixels have a similar class label \(k\).
A graphical model of the formulation looks like this:
For the experiments in this tutorial, we use a uniform prior \(P(c)\). However, in our most recent work we used a prior to incorporate beard segmentation (Egger 2018).
We use loopy-belief propagation with a simple sum-product algorithm for inference. This segmentation technique is well documented - a deeper understanding of those basics will not be necessary for this tutorial.
The important idea from the Markov random field based image segmentation is that we have two terms: the split into a data term and a smoothness or edge term. This is a very common choice in segmentation algorithms - so, let's have a look at their influence on segmentation results.
We start with a simple image from the Labeled Faces in the Wild which we would like to segment into 4 classes.
Please execute the following code and we will have a look at the result:
import java.io.File
import scalismo.faces.color.RGB
import scalismo.faces.image.PixelImage
import scalismo.faces.io.PixelImageIO
import scalismo.faces.segmentation.LoopyBPSegmentation
import scalismo.faces.segmentation.LoopyBPSegmentation.Label
import scalismo.utils.Random
scalismo.initialize()
implicit val rnd = Random(1986)
val image = PixelImageIO.read[RGB](new File("data/2.png")).get
val numLabels = 4
val initMask: PixelImage[Option[Label]] = PixelImage(image.width, image.height, { (x, y) => Some(Label(rnd.scalaRandom.nextInt(numLabels))) })
val smoothnessDistribution = LoopyBPSegmentation.binDistribution(0.99, numLabels, image.width, image.height)
val segImage = LoopyBPSegmentation.segmentImage(image, initMask, smoothnessDistribution, numLabels, 50, true).map{_.maxLabel}
If you have problems executing this code make sure you have the data-directory in the same folder as you are executing the jar file.
If you executed above code a window opened and shows you the iterations of the segmentation. In the middle the current iteration of the segmentation are visualized. On the left, you can see random samples from the models estimated from the current segmentation. The models for each class are simple Gaussian color models built on all pixels labeled belonging to the respective class. On the right you can see the current belief of each pixel on its label - this is calculated based on the values of the likelihood \(\ell_k\) of the pixel belonging to each model (an internal value of the MRF segmentation) .
Let's have a closer look at what the segmentation takes as input:
The most interesting parameters to play with are the number of labels and the smoothness distribution.
We first have a look at the influence of the number of labels by changing them to 2 classes:
val numLabels = 2
val initMask: PixelImage[Option[Label]] = PixelImage(image.width, image.height, { (x, y) => Some(Label(rnd.scalaRandom.nextInt(numLabels))) })
val smoothnessDistribution = LoopyBPSegmentation.binDistribution(0.99, numLabels, image.width, image.height)
val segImage = LoopyBPSegmentation.segmentImage(image, initMask, smoothnessDistribution, numLabels, 50, true).map{_.maxLabel}
And 3 classes:
val numLabels = 3
val initMask: PixelImage[Option[Label]] = PixelImage(image.width, image.height, { (x, y) => Some(Label(rnd.scalaRandom.nextInt(numLabels))) })
val smoothnessDistribution = LoopyBPSegmentation.binDistribution(0.99, numLabels, image.width, image.height)
val segImage = LoopyBPSegmentation.segmentImage(image, initMask, smoothnessDistribution, numLabels, 50, true).map{_.maxLabel}
Now we change the smoothness parameter which enforces neighbouring pixels to be more likely of the same class:
val numLabels = 3
val initMask: PixelImage[Option[Label]] = PixelImage(image.width, image.height, { (x, y) => Some(Label(rnd.scalaRandom.nextInt(numLabels))) })
val smoothnessDistribution = LoopyBPSegmentation.binDistribution(0.999, numLabels, image.width, image.height)
val segImage = LoopyBPSegmentation.segmentImage(image, initMask, smoothnessDistribution, numLabels, 200, true).map{_.maxLabel}
In this example, we also adapted the number of iterations since convergence slows down.
There are two major limitations of this simple segmentation strategy:
For our application it is still highly applicable since we overcome those limitations by using the appearance prior of our face model as the likelihood in the face region and the position of the face as initialization.
Semantic Morphable Models Tutorial | Segmentation Basics | next