When what is not enough
True, sometimes it’s vital to distinguish between different kinds of objects. Is that a car speeding towards me, in which case I’d better jump out of the way? Or is it a huge Doberman (in which case I’d probably do the same)? Often in real life though, instead of coarse-grained classification , what is needed is fine-grained segmentation .
Zooming in on images, we’re not looking for a single label; instead, we want to classify every pixel according to some criterion:
- In medicine, we may want to distinguish between different cell types, or identify tumors.
- In various earth sciences, satellite data are used to segment terrestrial surfaces.
- To enable use of custom backgrounds, video-conferencing software has to be able to tell foreground from background.
Image segmentation is a form of supervised learning: Some kind of ground truth is needed. Here, it comes in form of a mask – an image, of spatial resolution identical to that of the input data, that designates the true class for every pixel. Accordingly, classification loss is calculated pixel-wise; losses are then summed up to yield an aggregate to be used in optimization.
The “canonical” architecture for image segmentation is U-Net (around since 2015).
Read more at
This is a companion discussion topic for the original entry at https://blogs.rstudio.com/tensorflow/posts/2020-11-30-torch-brain-segmentation