Improve my code :-)

francisbarton · July 5, 2020, 10:22pm

I got stuck the other day on a project where I was trying to build a list step by step, passing the last item on the list to a function which would then return the next item to be added to the list. And so on until a break point.

It seemed like it should be simple, and I felt a bit dim for not immediately knowing how to do it. I built myself a simple bit of toy code to try to improve my understanding of how to make this work.

Here's a slightly fuller description of what I want to achieve:

starting with a "seed" x (could be a number, string, data frame), apply a function f(x) to return y.
Now add y to a list such that you get list(x, y)
Now run f() on y to return z and add that to the list - so the list is growing incrementally through the repeated application of f()
Stop growing the list at some limit (list length, value of x, whatever) and complete, returning the whole list.

If I knew the length of the list in advance, or could map along a vector of inputs, that would be easy for me. But in my example, you don't know what the next item on the list is going to be until you've run f().

Here's the code I wrote. It works, and gives me the exact output I want - but I still feel like I am missing something obvious or a much better way of doing this perhaps? I felt like purrr::accumulate should be the right tool for this, but it only works for functions that take two arguments. My function only takes one argument (the last item on the current list).

Your code critiques are welcome!

My toy code

library(dplyr, warn.conflicts = FALSE)
library(purrr)

# yes I want it to return a list not a vector (for my real project)
square_it <- function(x) {
  y <- dplyr::last(x)^2
  list(x, y) %>% 
    purrr::flatten()
}

build_squares <- function(x) {
  while (dplyr::last(x) < 1e4) {
    x <- square_it(x)
  }
  return(x)
}

squares <- build_squares(2)
squares
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> [1] 4
#> 
#> [[3]]
#> [1] 16
#> 
#> [[4]]
#> [1] 256
#> 
#> [[5]]
#> [1] 65536

^{Created on 2020-07-05 by the reprex package (v0.3.0)}

technocrat · July 6, 2020, 6:16am

That's pretty much it, aside from taking advantage of R functions as first class objects

library(dplyr, warn.conflicts = FALSE)
library(purrr)

# yes I want it to return a list not a vector (for my real project)
square_it <- function(x) {
  y <- dplyr::last(x)^2
  list(x, y) %>% 
    purrr::flatten()
}

build_squares <- function(x) {
  while (dplyr::last(x) < 1e4) {
    x <- square_it(x)
  }
  return(x)
}

# alternative

build_squares(square_it(2))
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> [1] 4
#> 
#> [[3]]
#> [1] 16
#> 
#> [[4]]
#> [1] 256
#> 
#> [[5]]
#> [1] 65536

^{Created on 2020-07-05 by the reprex package (v0.3.0)}

nirgrahamuk · July 6, 2020, 9:05am

here is another approach

library(purrr)

recursive_square <- function(o, n) {
  if (n > 1e4) return(o)
  
  n2 <- n^2
  return(recursive_square(
    o = flatten(list(o, n2)),
    n = n2
  ))
}

recursive_square(list(2), 2)

francisbarton · July 8, 2020, 9:32am

Thanks - that's a neat way of combining it all into a single function.
I'm just glad to discover that as far as I can tell I didn't miss any completely obvious simpler ways of building this kind of routine.

nirgrahamuk · July 8, 2020, 9:34am

do you have a particular use case in mind ?

francisbarton · July 8, 2020, 10:11am

I did have a particular use case initially, yes. Just a weird little side project I was doing. But then I worked out a neater way of doing it and refactored the code I'd written.

I'll see if I can summarise it and post it here.

The project starts with an sf data frame called msoas_data which is all the MSOAs in England, with their geometries.
After choosing one MSOA as the initial 'seed', I then loop a function (landgrab ) using that seed, which adds on a new layer of all the MSOAs that are touching the seed, then returns the new shape. It's about building a contiguous shape, area by area.
Then keep on doing this until you reach a certain limit (I was using % of population)

Eventually I decided I didn't need or want all the intermediate shapes on the way to the limit after all - I'm happy with it just returning the final shape. So I don't need the list that I was asking for in this thread after all.

# get the total population figure for all areas (all of England)
total_population <- sum(msoas_data$population)

# function to get the population of intermediate area blocks
pop_report <- function(x) {
  x %>%
    sf::st_drop_geometry() %>%
    summarise(sum(population)) %>%
    pull()
}

# use st_touches to pull adjacent areas out ---------------------------

landgrab <- function(seed) {

  remainder <- msoas_data %>%
    filter(!msoa11cd %in% seed$msoa11cd)

  new_index <- sf::st_touches(seed, remainder) %>%
    unlist() %>%
    unique() %>%
    sample() # randomise so that initial shape irregularities are not amplified

  if (length(new_index) == 0) {
    break
  }

  else {
    new_layer <- remainder %>%
      slice(new_index)

    sf:::rbind.sf(seed, new_layer)
  }
}

# a function to build and return the block ---------------------------------------

build_block <- function(seed, fraction) {

  pop <- pop_report(seed)

  while (pop < (total_population*fraction)) {

    seed <- landgrab(seed)
    pop <- pop_report(seed)

  }
  seed
}

# Olympic Park & Mill Meads, Newham
pop_seed <- msoas_data %>%
  arrange(desc(population)) %>%
  slice(1)

twenty_percent <- build_block(pop_seed, 0.2)

nirgrahamuk · July 8, 2020, 10:29am

that's pretty neat !

francisbarton · July 8, 2020, 10:38am

Thanks! Just interested in producing some maps about how unevenly distributed population is, and also interested in what difference it makes which seed area you start with.

Someone had done something similar recently for the US states and I wondered how easy it would be to do for England.

system · July 15, 2020, 10:38am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.