Importing multiple images, checking the variance and eliminating the once with low variance

Alorecia · January 6, 2021, 4:13pm

Hello everyone,

I am new to R and have been trying to eliminate blurred/snow covered images from camera trap data. I have decided to import all these images and check the pixel variance for each of them, I want to then eliminate all the images with very low pixel variance and keep rest of them.

My code is a bit of mess right now, and I can't seem to figure out how to move forward with it.
Any kind of help would be appreciated.

install.packages("jpeg")
install.packages("ReadImages")
install.packages("Readbitmap")
install.packages ("Imager")
install.packages("magick")
install.packages("EBImage")

library(magick)
library(purrr)
library("jpeg")
library(EBImage)
library(dplyr)

setwd(".....")

Folder <- "....."
images <- list.files(path = Folder, pattern = "*.JPG", full.names = TRUE)
images
image_variance <- function(x) {
x %>%
as.raster() %>%
as.vector() %>%
map(col2rgb) %>%

mattwarkentin · January 6, 2021, 5:16pm

Hi @Alorecia,

So there are still some questions that need to be answered to get to a solution. Are your pictures colour or grayscale? If coloured, how should that be handled? Is the variance calculated channel-wise, or should the channels be averaged together to get a single pixel-wise intensity?

If the latter, which seems to make most sense, should each channel have equal weight, or should the channels be averaged according to how each channel is perceived by the human eye? I believe something like 0.3 * red + 0.6 * green + 0.1 * blue is approximately correct, but I need to double check those numbers.

With these questions answered, we can think about how we would compute the variance for a single image, and then it should be straightforward enough to extend this to multiple images.

Here is a quick attempt using two images I grabbed from Unsplash...

library(magick)
library(tidyverse)

imgs <- c("~/Desktop/winter.jpg", "~/Desktop/summer.jpg")
imgs <- map(imgs, image_read)

image_variance <- function(x) {
  x %>% 
    as.raster() %>% # convert to array
    as.vector() %>% # flatten to vector
    map(col2rgb) %>%  # Hex to RGB
    map_dbl(function(rgb) (rgb[[1]]*0.3) + (rgb[[2]]*0.6) + (rgb[[3]]*0.1)) %>% # average color channels
    var() # compute variance
}

# Variance for the two images
map_dbl(imgs, image_variance)
#> [1] 2917.129 2976.778

Alorecia · January 8, 2021, 10:44am

Hi @mattwarkentin

Thank you for the example.

My pictures are coloured, and are in jpeg. Well, honestly I am not sure which of way of calculating the variance would be the best. I tried the one that you mentioned and it works well as of now.
However, I think I would also like to try how the variance can be calculated channel-wise and compare which method works the best for me.

Could maybe explain what is the use of %>% in this particular function? I tried looking it up on other websites but it seems like it does something completely different here.

mattwarkentin · January 8, 2021, 6:40pm

Could maybe explain what is the use of %>% in this particular function?

%>% is know as the pipe operator. Pipe operators have been used extensively in other languages (e.g. | in the shell). Pipes work by taking the left-hand side of the pipe and "piping" it into the function on the right-hand side.

Because of this "pass-it-on" property of pipes, you can chain together long sequences of pipes in order to pass some data along a pipeline for it to come out the other side transformed in some way you want. You can learn more about it here: https://magrittr.tidyverse.org/reference/pipe.html.

Pipes have gained so much favour in the R community thanks to the magrittr package, that they are being added into base R (look for |> as an alternative in the not-so-distant future).

In other words, if you have some variable x and functions f() and g(), you could write it two ways in R that both are equal:

# Pipe
x %>% f() %>% g()

# Nested
g(f(x))

The first one is executed left-to-right, while the latter is executed inside-out. Typically the first is easier to read, though both will produce identical results.

In my code x is an image as is being piped along, undergoing various transformations:

image_variance <- function(x) {
  x %>% 
    as.raster() %>% # convert to array
    as.vector() %>% # flatten to vector
    map(col2rgb) %>%  # Hex to RGB
    map_dbl(function(rgb) (rgb[[1]]*0.3) + (rgb[[2]]*0.6) + (rgb[[3]]*0.1)) %>% # average color channels
    var() # compute variance
}

Alorecia · January 11, 2021, 11:44am

Thank you for the explanation.

system · January 18, 2021, 11:44am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.