Brain image segmentation with torch

When what is not enough

True, sometimes it’s vital to distinguish between different kinds of objects. Is that a car speeding towards me, in which case I’d better jump out of the way? Or is it a huge Doberman (in which case I’d probably do the same)? Often in real life though, instead of coarse-grained classification , what is needed is fine-grained segmentation .

Zooming in on images, we’re not looking for a single label; instead, we want to classify every pixel according to some criterion:

  • In medicine, we may want to distinguish between different cell types, or identify tumors.
  • In various earth sciences, satellite data are used to segment terrestrial surfaces.
  • To enable use of custom backgrounds, video-conferencing software has to be able to tell foreground from background.

Image segmentation is a form of supervised learning: Some kind of ground truth is needed. Here, it comes in form of a mask – an image, of spatial resolution identical to that of the input data, that designates the true class for every pixel. Accordingly, classification loss is calculated pixel-wise; losses are then summed up to yield an aggregate to be used in optimization.

The “canonical” architecture for image segmentation is U-Net (around since 2015).


This is a companion discussion topic for the original entry at