Face Reconstruction with Self-Shadowing

In this chapter, we apply the previously discussed techniques in order to reconstruct face pose, shape, texture, and illumination from a single image. Increasing the realism of the model with self-shadowing leads to increased reconstruction quality. We call the process of finding the model parameters given the image "fitting".

We now use the global illumination model during fitting. Compared to using a local illumination model, we use the deluminated Morphable Model to compute realistic illumination. Furthermore, we estimate environment maps differently, i.e. using the approach presented in the previous chapter. Apart from that, the fitting process is the same.

The local illumination model is not capable of creating self-shadowing as it only considers the relation of the surface normal to the illumination. As such it represents the base-line approach.

Overview

We compare the performance of the reconstruction with self-shadowing (MoMo-PRT) and the one with local illumination. We show two experiments. A qualitative where we reconstruct the face from a real image. In this experiment, we qualitatively compare the reconstructed shape, color, and illumination.

In the second experiment, we quantitatively measure the reconstruction performance by using artificial target images. We create renderings of faces whose parameters we know exactly. This allows us to measure how close the reconstructed face is to the one used to create the target image. We use different measures for the reconstruction performance such as shape, texture, illumination and face position errors. The goal is to find out, if the reconstruction with MoMo-PRT is significantly improved over the standard approach. We create multiple target images and reconstruct them with the standard approach and the one using MoMo-PRT. For each error measure we test the statistical significance.

Figure 1: Qualitative reconstruction experiment. Visual comparison between PRT and local-only model.

Figure 1: Qualitative reconstruction experiment. Visual comparison between PRT and local-only model.

Figure 1: Quantitative reconstruction experiment. Compare PRT with local-only illumination model to ground truth.

Figure 1: Quantitative reconstruction experiment. Compare PRT with local-only illumination model to ground truth.

Setup

We start by initializing scalismo and load the Morphable Model as well as its shadow-corrected version. We load the linear transfer model that we have computed in chapter two. We define a standard and a PRT-based MoMoRenderer.

Note, that we are loading the l5 model which is a model over low resolution meshes. If you want higher resolution meshes you can exchange the "l5" with "l6". However, it takes much more time to run the experiments.

scalismo.initialize() // Don't forget to initialize scalismo before loading the model

val momoFn = new File("data/supplied/model2017-1_bfm_nomouth_l5.h5") // exchange l5 with l6 to run it on higher resolution meshs to reproduce the paper

val momoDeluminatedFn = new File("data/supplied/model2017-1_no_self_shadowing_bfm_nomouth_l5.h5") // exchange l5 with l6 to run it on higher resolution meshs to reproduce the paper

val tfModelFn = "data/supplied/transferModel_model2017-1_no_self_shadowing_bfm_nomouth_l5.h5" // exchange l5 with l6 to run it on higher resolution meshs to reproduce the paper

val momo = MoMoIO.read(momoFn).get.neutralModel.truncate(50, 30)

val momoDeluminated = MoMoIO.read(momoDeluminatedFn).get.neutralModel.truncate(50, 30)

val linearTransferModel = LinearTransferModelIO.readTransferModel(tfModelFn).get

import scalismo.utils.Random
implicit val rnd: Random = Random(194205L)

val technique = LambertTechnique
  .withDecorator(PersistentCaching(_)).withCachingPath("tmp/prt-cache/illumination")
  //.withDecorator(ProfiledTechnique(_))

val parameterModel = LambertParameterModel(technique)
  .withParameterModifier(_
    .withOcclusionRaycastSamples(900)
    .withLightBounces(0)
    .withShBands(3)
  )

val momoRendererPrt =
  RenderingModel(technique)
    .withParameterModel(parameterModel)
    .withTransferModel(linearTransferModel)
    .getParametricMoMoRenderer(momoDeluminated).withClearColor(RGBA.BlackTransparent).cached(5)

val momoRenderer = MoMoRenderer(momo, RGBA.BlackTransparent).cached(5)

Qualitative Reconstruction Experiment

We start with the qualitative experiment. We reconstruct face shape, color, pose, and illumination from an image of Roger Federer. We chose an image where the face is strongly shadowed. Our aim is to explain the image better by using an illumination model capable of experssing self-shadowing. Because the modeling capapility has increased, the shadowing should be explained by changing the shape and the illumination instead of the texture.

Figure 1: Target image for the qualitative experiment. Copyright: KEYSTONE/EPA/Justin Lane

Figure 1: Target image for the qualitative experiment. Copyright: KEYSTONE/EPA/Justin Lane

We start by loading the target image and target landmarks.

val name = "federer"
//val name = "face_65202" // try this for another example
val targetImage = PixelImageIO.read[RGBA](new File(s"data/fitting/${name}.png")).get
val targetLandmarks = TLMSLandmarksIO.read2D(new File(s"data/fitting/${name}.tlms")).get

To inform you about how close the current iteration is to the target image, we measure the average image error between the target image and the rendered reconstruction. We expect to get a smaller image error as the iteration count increases.

//compares only the pixels on the face
def avgImageError(rendering: PixelImage[RGBA], targetImage: PixelImage[RGBA]) = {
  var nPixels = 0
  val pixelsToCompare = rendering.mapWithIndex { case (px, x, y) =>
    if (px.a == 1.0) {
      val diff = px - targetImage(x, y)
      nPixels += 1
      diff.toRGB.norm
    } else 0.0
  }
  pixelsToCompare.values.toIndexedSeq.sum/nPixels.toDouble
}
def createMetric(targetImage: PixelImage[RGBA]) = new FittingMetric {
  override def computeMetricData(momoRenderer: MoMoRenderer, param: RenderParameter): Map[String, Double] = {
    val rendering = momoRenderer.renderImage(param)
    Map("averageImageError" -> avgImageError(rendering, targetImage))
  }
}

The error value will be displayed and updated at every iteration of the reconstruction process.

val metricNoGT = createMetric(targetImage)

To visualize the reconstruction process, we create a window showing the current reconstruction and the target image. We create a window for the PRT model and the local model.

val prtPanel = new InteractiveFittingPanel(momoRendererPrt, targetImage, metricNoGT)
val loggerPrt = prtPanel.subSampled(10)
val figurePRT = new InteractiveFigure("Fitting PRT", targetImage.width, targetImage.height + 90, RGBA.White, interactiveRenderPanel = prtPanel)

val stdPanel = new InteractiveFittingPanel(momoRenderer, targetImage, metricNoGT)
val loggerStd = stdPanel.subSampled(10)
val figureStd = new InteractiveFigure("Fitting Standard", targetImage.width, targetImage.height + 90, RGBA.White, interactiveRenderPanel = stdPanel)

We specify the number of iterations of the Metropolis Hastings algorithm we use to reconstruct the face. It requires an initial value for all the parameters we reconstruct (init). You can change the number of principal components used by changing nShpPC and nTexPC.

import scala.concurrent.Future
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(4))

/* If you want to get meaningful results you need to increase the amount of iterations to 10000.
 Then the full experiment will take much longer compared to 100 iterations.
 We suggest, you leave the iteration number first at 100 to follow the tutorial.
 After you have finished, you can then increase it to 10000 to reproduce the results from the paper.
 */
val iterations = 100

val init = RenderParameter.defaultSquare.withImageSize(ImageSize(128, 128))


val experiment1 = Future {
  PRTFitter(momoRendererPrt, iterations, withSHLOptimizer = true, withPoseInitialization = true, adaptDistance = true, withFocalLength = true, nShpPC = 50, nTexPC = 30, pixelValueSdev = 0.043, lmSdev = 2.0)
    .fit(targetImage, targetLandmarks, init, loggerPrt, "tmp/fitting/prt")
}

val experiment2 = Future {
  PRTFitter(momoRenderer, iterations, withSHLOptimizer = true, withPoseInitialization = true, adaptDistance = true, withFocalLength = true, nShpPC = 50, nTexPC = 30, pixelValueSdev = 0.043, lmSdev = 2.0)
    .fit(targetImage, targetLandmarks, init, loggerStd, "tmp/fitting/std")
}

This starts the reconstruction process and shows an interactive window. The window shows the target image, and different visualizations of the current reconstruction. You can activate/deactivate the visualizations with the following key strokes:

Use the different visualisations to compare the two reconstruction methods. Compare the texture reconstruction performance by disabling the illumination and the shape in both windows. The resulting images show the mean face under an ambient illumination with the reconstructed texture. Interesting regions are the ones that are shadowed, such as the eye region. There, we see, that by using the local model the texture model tries to reproduce the shadows much more than it is the case when using the global model. The global model reproduces the shadowing by changing the shape and the illumination. This shows, that using Morphable Model Precomputed Radiance Transfer could result in better reconstructions.

Quantitative Reconstruction Experiment

In the last experiment, we saw, that using a global model might help to reconstruct some aspects of the face. In this experiment, we compare the reconstruction performance between using the local model versus using the global model. To obtain accurate ground truth to compare the two approaches, we use renderings of artificial faces. Our goal is to find out whether there is a significant difference between the two methods.

Target Selection

To be able to measure a significant effect from a small amount of target images, we create a specific sampling scheme. For many faces and illuminations the effect of self-shadowing is only marginally present. Therefore, we sample faces and illuminations where global illumination has a large effect.

To determine how much self-shadowing influences an image, we create two renderings of the face. Once with self-shadowing and once only with local-illumination. If self-shadowing is present, then there is a difference between the two renderings.

The following code snippet samples a random face and selects one that has a large difference between the two renderings. Then, given the selected face, we sample random illuminations and select the ones that result in a large difference between a rendering with global illumination and the one with local-only illumination.

We will select faces and illumination according to the following criterion:

//compute difference between rendering the parameter under a local illumination and rendering it under a global one
def absoluteDifferenceToGlobalIllumination(targetParam: RenderParameter, targetInstanceMoMo: MoMoInstance, targetInstanceDelum: MoMoInstance) = {
  val targetRenderParameterDelum = targetParam.withMoMo(targetInstanceDelum)
  val targetImage = clamp(momoRendererPrt.renderImage(targetRenderParameterDelum))
  val targetRenderParameterMoMo = targetParam.withMoMo(targetInstanceMoMo)
  val targetImageLocal = clamp(momoRenderer.renderImage(targetRenderParameterMoMo))
  val nPixels = targetImage.values.filter( _.a == 1.0).length
  val diff = (targetImage.values.toIndexedSeq, targetImageLocal.values.toIndexedSeq).zipped.map { case (a, b) => (a - b).toRGB.norm }.sum
  diff / nPixels.toDouble
}

We now sample faces and illuminations that lead to strong self-shadowing.

/* sample a face and an illumination such that self-shadowing has an effect
i.e. find faces that have concavities and illuminations that produce shadowing. */
def sampleTarget(n: Int): (Double, Double, RenderParameter, MoMoCoefficients, MoMoCoefficients, VertexColorMesh3D) = {
  //select face shape that has strong influence on self-shadowing
  val ambient = SphericalHarmonicsLight.ambientWhite
  val faces = for(i <- 0 until n) yield {
    val targetMesh = momo.sample()
    val targetInstanceMoMo = momo.coefficients(targetMesh)
    val targetInstanceDeluminated = momoDeluminated.coefficients(targetMesh)
    val targetParam = init
      .withEnvironmentMap(ambient)
    val diff = absoluteDifferenceToGlobalIllumination(targetParam, targetInstanceMoMo, targetInstanceDeluminated)
    (diff, targetMesh, targetInstanceDeluminated, targetInstanceMoMo)
  }
  val (diffMesh, targetMesh, targetInstanceDeluminated, targetInstanceMoMo) = faces.maxBy(_._1)

  //select illumination that casts shadows
  val params = for(i <- 0 until n) yield {
    val randomIllumination = SphericalHarmonicsLight( (0 until 9).map{ a=>
      val v = rnd.scalaRandom.nextGaussian()
      Vector(v, v, v)
    } )
    val targetParam = init
      .withEnvironmentMap(randomIllumination)
    val diff = absoluteDifferenceToGlobalIllumination(targetParam, targetInstanceMoMo, targetInstanceDeluminated)
    (diff, targetParam, targetInstanceMoMo, targetInstanceDeluminated)
  }
  val (diffIllum, param, instMoMo, instDelum) = params.maxBy(_._1)
  (diffMesh, diffIllum, param, instMoMo, instDelum, targetMesh)
}

We render the selected illuminated faces to create the target images and landmarks for our quantitative experiment.

val numberOfExperiments = 4 // increase to 8 or more to get meaningful results

//generate the target images from sampled ground truth parameters
val targets = for (i <- 0 until numberOfExperiments) yield {
  val (diffMesh, diffIllum, targetParam, instMoMo, instDelum, targetMesh) = sampleTarget(200) // increase n to select more extreme shapes and illuminations
  val gtGlobal = targetParam.withMoMo(instDelum)
  val gtLocal = targetParam.withMoMo(instMoMo)
  val targetImage = clamp(momoRendererPrt.renderImage(gtGlobal))
  val targetLandmarks = momoRendererPrt.allLandmarkIds.flatMap {
    lm => momoRendererPrt.renderLandmark(lm, gtGlobal)
  }
  PixelImageIO.write(targetImage, new File(s"target_${i}.png")).get
  println(s"selected target. Difference due shape selection: ${diffMesh}. Difference due to shape and illumination selection: ${diffIllum}")
  (targetImage, targetLandmarks, gtLocal, gtGlobal)
}

To make the selection more extreme increase n.

Face Reconstruction

Having a set of target images, we turn to the reconstruction part. First, we define a metric to compare the reconstructions with the ground truth. Compared to the qualitative experiment above, we can now measure how shape, texture, and illumination differs from the ground truth. We compute the average difference between corresponding vertices of the reconstructed shape and the ground truth shape. Similarly, we measure the texture difference. We compute the sum of the magnitude of the per vertex difference between the color values. We also measure the illumination error by comparing the spherical harmonics coefficients.

def createMetricWithGroundTruth(targetParam: RenderParameter, targetImage: PixelImage[RGBA]) = new FittingMetric{
  override def computeMetricData(momoRenderer: MoMoRenderer,
                                 param: RenderParameter
                                ): Map[String, Double] = {
    val rendering = momoRenderer.renderImage(param)
    val instTarget = momoRenderer.instance(targetParam)
    val instCurrent = momoRenderer.instance(param)
    //deviation of current reconstruction to ground truth values
    val imageError = avgImageError(rendering, targetImage)
    val n = param.momo.coefficients.shape.length
    val avgShapeError = (instTarget.shape.position.pointData, instCurrent.shape.position.pointData).zipped.map { case (a, b) => (a - b).norm }.sum / instTarget.shape.pointSet.numberOfPoints.toDouble
    val avgTextureError = (instTarget.color.pointData, instCurrent.color.pointData).zipped.map { case (a, b) => (a - b).toRGB.norm }.sum / instTarget.shape.pointSet.numberOfPoints.toDouble
    val m = math.max(targetParam.environmentMap.coefficients.length*3, param.environmentMap.coefficients.length*3)
    val illumError = breeze.linalg.norm(
      DenseVector(targetParam.environmentMap.coefficients.flatMap(_.toArray).toArray.padTo(m,0.0)) -
            DenseVector(param.environmentMap.coefficients.flatMap(_.toArray).toArray.padTo(m, 0.0))
    )
    val posError = (targetParam.pose.translation - param.pose.translation).norm
    Map(
      ("avgImageError" -> imageError),
      ("avgShapeError" -> avgShapeError),
      ("avgTextureError" -> avgTextureError),
      ("IlluminationError" -> illumError),
      ("PositionError" -> posError),
    )
  }
}

( As you can see, we measure the distance of face to the camera as well. )

We now have all prerequisites to start the reconstruction process. With the following we set up the visualisation windows showing the iterations of every reconstruction and start the reconstruction itself. Be prepared to wait an hour or more, depending on your hardware.


  //define fitting experiment
  val experiments = for ( ((targetImage, targetLandmarks, targetRenderParameterMoMo, targetRenderParameterDelum), i) <- targets.zipWithIndex) yield {
    //setup visualization
    val metricPRT = createMetricWithGroundTruth(targetRenderParameterDelum, targetImage)
    val prtPanel = new InteractiveFittingPanel(momoRendererPrt, targetImage, metricPRT)
    val loggerPrt = prtPanel.subSampled(100)
    val figprt = new InteractiveFigure(s"Fitting PRT ${i}", prtPanel.w, prtPanel.h, RGBA.White, interactiveRenderPanel = prtPanel)

    val metricStd = createMetricWithGroundTruth(targetRenderParameterMoMo, targetImage)
    val stdPanel = new InteractiveFittingPanel(momoRenderer, targetImage, metricStd)
    val loggerStd = stdPanel.subSampled(100)
    val figstd = new InteractiveFigure(s"Fitting Standard ${i}", prtPanel.w, prtPanel.h, RGBA.White, interactiveRenderPanel = stdPanel)

    //disable console logging
    val consoleLogging = new PrintStream(new OutputStream {
      override def write(i: Int): Unit = {}
    })
    //to enable logging to console: val consoleLogging = Console.out

    //reconstruction. We do not optimize for focal length and distance from the face to the camera is fixed.
    val prt = Future {
      PRTFitter(momoRendererPrt, iterations, withSHLOptimizer = true, withPoseInitialization = false, adaptDistance = false, nShpPC = 50, nTexPC = 30, pixelValueSdev = 0.03, lmSdev = 1.0, printStream = consoleLogging)
        .fit(targetImage, targetLandmarks, init, loggerPrt, "tmp/fitting/prt")
    }
    val std = Future {
      PRTFitter(momoRenderer, iterations, withSHLOptimizer = true, withPoseInitialization = false, adaptDistance = false, nShpPC = 50, nTexPC = 30, pixelValueSdev = 0.03, lmSdev = 1.0, printStream = consoleLogging)
        .fit(targetImage, targetLandmarks, init, loggerStd, "tmp/fitting/std")
    }
    (prt, std, targetImage, targetRenderParameterDelum, targetRenderParameterMoMo)
  }

To evaluate the experiment, we need for all results to finish. This can take some time, i.e. hours depending on your hardware and the number of reconstructions specified above.


  //wait for fittings to finish, might need to change the amount of seconds we wait for results.
  import scala.concurrent.duration._
  import scala.concurrent.Await
  val results = experiments.map { exp =>
    val prt = Await.result(exp._1, Duration.Inf)
    val std = Await.result(exp._2, Duration.Inf)
    (prt, std, exp._3, exp._4, exp._5)
  }

Evaluation

We compare MoMo-PRT to the local-only illumination with respect to the different metrics we defined above. To determine wether one of the approaches is significantly better on a specific metric, we use the Wilcoxon signed rank test.

We start by collecting the measurements we made of each reconstruction experiment.

//compute error to ground truth for each fitting experiment
val metricData = results.map { r =>
  (createMetricWithGroundTruth(r._4, r._3).computeMetricData(momoRendererPrt, r._1),
    createMetricWithGroundTruth(r._5, r._3).computeMetricData(momoRenderer, r._2))
}

//collect data of each error measure
val valuesPerMeasure = (for (errorMeasureName <- metricData(0)._1.keys) yield {
  val values = metricData.map { data =>
    (data._1(errorMeasureName), data._2(errorMeasureName))
  }
  (errorMeasureName -> values)
}).toMap

For each measure we compute mean and variance and display the difference values.


  //compute mean and standard deviation of each error measure
  val muVar = for ((name, data) <- valuesPerMeasure) yield {
    val mvPrt = breeze.stats.meanAndVariance(data.map(_._1).toArray)
    val mvStd = breeze.stats.meanAndVariance(data.map(_._2).toArray)
    (name -> (mvPrt, mvStd))
  }
  //display mean and variance
  muVar.foreach { case (name, (mvPrt, mvStd)) =>
    println(f"PRT ${name} mu: ${mvPrt.mean}%2.4f var: ${mvPrt.variance}%2.4f")
    println(f"Std ${name} mu: ${mvStd.mean}%2.4f var: ${mvStd.variance}%2.4f")
  }

Running the code might give you something similar to the following results.

PRT IlluminationError mu: 2.6850 var: 2.0688
Std IlluminationError mu: 3.8720 var: 1.7919
PRT avgShapeError mu: 2.9166 var: 0.2352
Std avgShapeError mu: 2.9856 var: 0.5773
PRT avgTextureError mu: 0.1088 var: 0.0008
Std avgTextureError mu: 0.1797 var: 0.0033
PRT avgImageError mu: 0.1505 var: 0.0007
Std avgImageError mu: 0.1777 var: 0.0018

These results suggest, that MoMo-PRT could be superior to the local-only approach with respect to all measures. To see if the results are significant, we run the Wilcoxon signed rank test.


  //test if mean of prt fitting differs from std fitting
  import org.apache.commons.math3.stat.inference.WilcoxonSignedRankTest
  println("metric / P value")
  for ((name, values) <- valuesPerMeasure) {
    val test = new WilcoxonSignedRankTest()
    println(s"$name ${test.wilcoxonSignedRankTest(values.map(_._1).toArray, values.map(_._2).toArray, false)}")
  }

For each measure, we get an approximation of the P value indicating how likely it is that we got the results by chance.

You may see similar results to the ones here.

IlluminationError 0.014
avgShapeError     0.833
avgTextureError   0.009
avgImageError     0.014

It seems, that shape, texture, and illumination can be better reconstructed when using MoMo-PRT than when using the local-only model.

Conclusion

We showed, that adding global illumination to the model increases the reconstruction performance. The model with only local illumination is very capable to reproduce the target image. However, it does so at the expense of the reconstruction performance. The proposed model can now instead of representing self-shadowing in the texture, express it in terms of shape and illumination. This leads to siginficant reconstruction performance improvements of texture and illumination. Shape reconstruction seems not to significantly benefit from using the global illumination model.

Tutorial Home | Previous Chapter: Illumination Estimation