Using dplyr to create a list of suburb names based on unique postcodes

Hi there,

I am trying to convert a (long) list with postcodes and suburb names into a (wide) list where only each postcode appears once and where the postcode covers more than one suburb, to list the suburb names in another column.

data.frame(
  stringsAsFactors = FALSE,
          Postcode = c(4161L,4155L,4154L,4153L,
                       4159L,4157L,4158L,4160L,4160L,4163L,4160L,4165L,
                       4165L,4157L,4164L,4165L,4178L,4179L,4179L,4172L),
          SA2.name = c("Alexandra Hills",
                       "Belmont - Gumdale","Belmont - Gumdale","Belmont - Gumdale",
                       "Birkdale","Capalaba","Thorneside","Wellington Point",
                       "Cleveland","Cleveland","Ormiston","Redland Bay",
                       "Sheldon - Mount Cotton","Sheldon - Mount Cotton","Thornlands",
                       "Victoria Point","Brisbane Port - Lytton",
                       "Manly - Lota","Manly West","Murarrie")
)

In the above dataset, the postcode 4161 appears twice as it covers both Wellington Point and Cleveland.

I have tried the following code:

postcode_unique_suburb_names <- pivot_wider(postcode_pop_sub, names_from = SA2.name, values_from = Postcode)
#> Error in pivot_wider(postcode_pop_sub, names_from = SA2.name, values_from = Postcode): could not find function "pivot_wider"

Created on 2022-11-15 with reprex v2.0.2

This seems to be on the right track:

data.frame(
                             check.names = FALSE,
                       `Alexandra Hills` = c(4161),
                     `Belmont - Gumdale` = c(4155:4153),
                                Birkdale = c(4159),
                                Capalaba = c(4157),
                              Thorneside = c(4158),
                      `Wellington Point` = c(4160),
                               Cleveland = c(c(4160, 4163)),
                                Ormiston = c(4160),
                           `Redland Bay` = c(4165),
                `Sheldon - Mount Cotton` = c(c(4165, 4157)),
                              Thornlands = c(4164),
                        `Victoria Point` = c(4165),
                `Brisbane Port - Lytton` = c(4178),
                          `Manly - Lota` = c(4179),
                            `Manly West` = c(4179),
                                Murarrie = c(c(4172, 4174)),
                                Tingalpa = c(4173),
                                Wakerley = c(4154),
                                  Wynnum = c(4178),
                 `Wynnum West - Hemmant` = c(c(4178, 4174)),
                            `Bald Hills` = c(4036),
                       `Bridgeman Downs` = c(4035),
                              Carseldine = c(4034),
                          `Everton Park` = c(4053),
                                McDowall = c(4053),
                                  Aspley = c(4034),
                               Chermside = c(4032),
                        `Chermside West` = c(4032),
                                 Geebung = c(4034),
                  `Kedron - Gordon Park` = c(c(4031, 4055)),
                                Stafford = c(4053),
                      `Stafford Heights` = c(4053),
                        `Wavell Heights` = c(4012),
                                Boondall = c(4034),
                      `Brisbane Airport` = c(4008),
                 `Eagle Farm - Pinkenba` = c(c(4008, 4007, 4009)),
                  `Northgate - Virginia` = c(c(9464, 4013, 4014)),
                        `Nudgee - Banyo` = c(4014),
                                  Nundah = c(4012),
                         `Bracken Ridge` = c(4017),
                        `Brighton (Qld)` = c(4017),
                                  Deagon = c(4017),
                `Sandgate - Shorncliffe` = c(4017),
                   `Taigum - Fitzgibbon` = c(4018),
                                Zillmere = c(4034),
                             `Camp Hill` = c(4152),
                           `Cannon Hill` = c(4170),
                                  Carina = c(4152),
                        `Carina Heights` = c(4152),
                               Carindale = c(4152),
                                Annerley = c(c(4103, 4105)),
                               Coorparoo = c(4151),
               `Fairfield - Dutton Park` = c(4103:4102),
                             Greenslopes = c(c(4103, 4120)),
                          `Holland Park` = c(4121),
                     `Holland Park West` = c(4121),
                           Woolloongabba = c(4102),
                                 Yeronga = c(4105:4104),
                     `Eight Mile Plains` = c(4113),
                       `Macgregor (Qld)` = c(4109),
                       `Mansfield (Qld)` = c(4122),
                         `Mount Gravatt` = c(4122:4121),
                   `Rochedale - Burbank` = c(c(4156, 4123)),
                   `Upper Mount Gravatt` = c(4122),
                                 Wishart = c(4122),
                `Rocklea - Acacia Ridge` = c(c(4108, 4110, 4106)),
                        `Coopers Plains` = c(4108),
                                Moorooka = c(4105),
                               Robertson = c(4109),
                    `Salisbury - Nathan` = c(c(4111, 4860, 4107)),
                              Tarragindi = c(4121),
                                Algester = c(4115),
                  `Calamvale - Stretton` = c(4116:4117),
                   `Pallara - Willawong` = c(4110),
                  `Parkinson - Drewvale` = c(4116:4115),
                                  Kuraby = c(4112),
                                 Runcorn = c(4113),
                               Sunnybank = c(4109),
                       `Sunnybank Hills` = c(4109),
              `Jindalee - Mount Ommaney` = c(4074),
        `Middle Park - Jamboree Heights` = c(4074),
                              Riverhills = c(4074),
  `Seventeen Mile Rocks - Sinnamon Park` = c(4073),
                                Westlake = c(4074),
                  `Bellbowrie - Moggill` = c(4070),
            `Brookfield - Kenmore Hills` = c(4069),
                           `Chapel Hill` = c(4069),
                       `Fig Tree Pocket` = c(4069),
                                 Kenmore = c(4069),
           `Pinjarra Hills - Pullenvale` = c(4069:4070),
                  `Chelmer - Graceville` = c(c(4075, 4068)),
                                 Corinda = c(4075),
                           Indooroopilly = c(4068),
                                Sherwood = c(4075),
                              `St Lucia` = c(c(4067, 4072)),
                                 Taringa = c(4068),
                                Enoggera = c(4051),
                    `Enoggera Reservoir` = c(4520),
                                 Keperra = c(4054),
                              Mitchelton = c(4053),
                        `Mount Coot-tha` = c(4066),
                               `The Gap` = c(4061),
            `Upper Kedron - Ferny Grove` = c(4055),
                                                                      `Brisbane City` = c(c(4001, 4000, 4002, 4003, 9000, 9001, 9002, 9005, 9007, 9009, 9010, 9013, 9015),
                                             NA),
                      `Fortitude Valley` = c(4006),
                         `Highgate Hill` = c(4101),
                              `New Farm` = c(4005),
                        `South Brisbane` = c(4101),
                           `Spring Hill` = c(c(4000, 4004)),
                              `West End` = c(4101),
                                Balmoral = c(4171),
                                 Bulimba = c(4171),
                         `East Brisbane` = c(4169),
                               Hawthorne = c(4171),
             `Morningside - Seven Hills` = c(4170),
                           `Norman Park` = c(4170),
                                  Albion = c(4010),
                                Alderley = c(4051),
                                   Ascot = c(4007),
                               Clayfield = c(4011),
                                  Grange = c(4051),
                        `Hamilton (Qld)` = c(4007),
                                  Hendra = c(4011),
                `Kelvin Grove - Herston` = c(c(4059, 4006, 4029)),
                               Newmarket = c(4051),
                `Newstead - Bowen Hills` = c(4006:4005),
                                 Wilston = c(4051),
                                 Windsor = c(4030),
                  `Wooloowin - Lutwyche` = c(4030),
                                Ashgrove = c(4060),
                            Auchenflower = c(4066),
                                  Bardon = c(4065:4066),
                   `Paddington - Milton` = c(4064),
                        `Red Hill (Qld)` = c(4059),
                                 Toowong = c(4066),
                               Brinsmead = c(4870),
         `Clifton Beach - Kewarra Beach` = c(c(4879, 4871)),
                `Freshwater - Stratford` = c(4870),
                                Redlynch = c(c(4870, 4878)),
            `Trinity Beach - Smithfield` = c(4879:4878),
          `Yorkeys Knob - Machans Beach` = c(4878),
                          `Bentley Park` = c(4869),
                           `Cairns City` = c(c(4870, 4872)),
           `Earlville - Bayview Heights` = c(c(4870, 4868)),
                                Edmonton = c(4869),
                  `Gordonvale - Trinity` = c(c(4865, 4871, 4869)),
                 `Kanimbla - Mooroobool` = c(4870),
                            `Lamb Range` = c(4870),
                                 Manoora = c(4870),
                                 Manunda = c(4870),
                        `Mount Sheridan` = c(4868),
                  `Westcourt - Bungalow` = c(4870),
                            `White Rock` = c(4868),
                 `Whitfield - Edge Hill` = c(4870),
                                   Woree = c(4868),
                                 Babinda = c(c(4860, 4871, 4861)),
                               Innisfail = c(4860),
                                                                            Johnstone = c(c(4859, 4852, 4858, 4871, 4860, 4855, 4856, 4857),
                                             NA),
                                   Tully = c(c(4854, 4849, 4852, 4816)),
                            Wooroonooran = c(4860),
                                Yarrabah = c(4871),
                                Daintree = c(c(4873, 4871)),
                          `Port Douglas` = c(c(4877, 4871, 4873)),
                                Atherton = c(c(4883, 4872, 4882)),
                               Herberton = c(c(4888, 4872, 4887)),
                                 Kuranda = c(c(4872, 4881, 4871)),
                                                               `Malanda - Yungaburra` = c(c(4860, 4886, 4885, 4884, 4872, 4883),
                                             NA),
                                 Mareeba = c(4880),
                                 Balonne = c(c(4487, 4486, 4488, 4497)),
                              Chinchilla = c(4413:4412),
                             Goondiwindi = c(4390),
                                                               `Inglewood - Waggamba` = c(c(4357, 4387, 4496, 4497, 4390, 4498, 4494, 4388, 4352, 4384, 4385),
                                             NA),
                                                                    `Miles - Wandoan` = c(c(4418, 4424, 4425, 4415, 4419, 4427, 4416),
                                             NA),
                                    Roma = c(4455),
                                                                     `Roma Surrounds` = c(c(4455, 4461, 4426, 4427, 4428, 4417, 4423, 4465, 4487, 4462, 4467, 4454),
                                             NA),
                                                                                 Tara = c(c(4421, 4413, 4405, 4406, 4422, 4423),
                                             NA),
                                                               `Crows Nest - Rosalie` = c(c(4614, 4355, 4352, 4306, 4314, 4353, 4402, 4354, 4401, 4403),
                                             NA),
                                                                            Jondaryan = c(c(4401, 4350, 4352, 4356, 4403, 4404, 4405, 4407),
                                             NA),
                              Millmerran = c(c(4357, 4352, 4407)),
                              Pittsworth = c(c(4356, 4363, 4350, 4352, 4364)),
                                   Wambo = c(c(4405, 4411, 4406, 4408, 4410)),
                  `Clifton - Greenmount` = c(c(4359, 4361, 4360, 4352, 4358)),
                 `Southern Downs - East` = c(c(4370, 4371, 4373, 4372)),
                                                              `Southern Downs - West` = c(c(4362, 4361, 4370, 4374, 4352, 4365),
                                             NA),
                              Stanthorpe = c(4380),
                                                               `Stanthorpe Surrounds` = c(c(4352, 4380, 4381, 4375, 4382, 4383, 4376, 4377, 4378),
                                             NA),
                                 Warwick = c(4370),
              `Central Highlands - East` = c(c(4712, 4702, 4718, 4713, 4717)),
                                                           `Central Highlands - West` = c(c(4702, 4722, 4721, 4723, 4720, 4709),
                                             NA),
                                 Emerald = c(4720),
             `Agnes Water - Miriam Vale` = c(c(4678, 4677, 4674, 4676)),
                                                                               Banana = c(c(4702, 4420, 4419, 4719, 4718, 4715, 4716),
                                             NA),
                                 Biloela = c(4715),
           `Boyne Island - Tannum Sands` = c(4680),

When I attempt to join this data frame with another (that has total population per postcode) using this code:

joined_postcode_sub_pop <- inner_join(postcode_unique_suburb_names, postcode_pop, by = "Postcode")
#> Error in inner_join(postcode_unique_suburb_names, postcode_pop, by = "Postcode"): could not find function "inner_join"

Created on 2022-11-15 with reprex v2.0.2

I get the following error message:

Error in inner_join():
! Join columns must be present in data.
:heavy_multiplication_x: Problem with Postcode.
Run rlang::last_error() to see where the error occurred.
Boyne Island - Tannum Sands = c(4680),
Error: unexpected ',' in " Boyne Island - Tannum Sands = c(4680),"

I've checked both data sets and Boyne Island and 4680 exist in both. The rlang::last_error() states:

<error/rlang_error>
Error in `inner_join()`:
! Join columns must be present in data.
✖ Problem with `Postcode`.
---
Backtrace:
 1. dplyr::inner_join(...)
 2. dplyr:::inner_join.data.frame(postcode_unique_suburb_names, postcode_pop, by = "Postcode")
Run `rlang::last_trace()` to see the full context.

The postcode_pop dataframe is:

        check.names = FALSE,
           Postcode = c(4000L,4001L,4002L,4003L,
                        4004L,4005L,4006L,4007L,4008L,4009L,4010L,4011L,
                        4012L,4013L,4014L,4017L,4018L,4019L,4020L,4021L),
  `sum(Population)` = c(93931L,43756L,21878L,
                        10939L,6419L,25019L,82679L,31752L,1288L,1704L,11896L,
                        26789L,46214L,13738L,50730L,81535L,25238L,81964L,
                        21014L,17717L)
)

Thank you, in advance, for any help or advice you can provide.

Below is one approach that collapses all suburb names into one column separated by a comma. The Population column is NA for all rows because the postcodes in postcode_names do not exist in postcode_pop. I'm assuming these are both just subsets of larger datasets.

library(tidyverse)

postcode_names = data.frame(
  stringsAsFactors = FALSE,
  Postcode = c(4161L,4155L,4154L,4153L,
               4159L,4157L,4158L,4160L,4160L,4163L,4160L,4165L,
               4165L,4157L,4164L,4165L,4178L,4179L,4179L,4172L),
  SA2.name = c("Alexandra Hills",
               "Belmont - Gumdale","Belmont - Gumdale","Belmont - Gumdale",
               "Birkdale","Capalaba","Thorneside","Wellington Point",
               "Cleveland","Cleveland","Ormiston","Redland Bay",
               "Sheldon - Mount Cotton","Sheldon - Mount Cotton","Thornlands",
               "Victoria Point","Brisbane Port - Lytton",
               "Manly - Lota","Manly West","Murarrie")
  )

postcode_pop = data.frame(check.names = FALSE,
                          Postcode = c(4000L,4001L,4002L,4003L,
                                       4004L,4005L,4006L,4007L,4008L,4009L,4010L,4011L,
                                       4012L,4013L,4014L,4017L,4018L,4019L,4020L,4021L),
                          Population = c(93931L,43756L,21878L,
                                         10939L,6419L,25019L,82679L,31752L,1288L,1704L,11896L,
                                         26789L,46214L,13738L,50730L,81535L,25238L,81964L,
                                         21014L,17717L)
                          )

out = postcode_names %>%
  arrange(Postcode, SA2.name) %>%
  group_by(Postcode) %>%
  mutate(SA2.name = paste(SA2.name, collapse = ', ')) %>%
  ungroup() %>%
  distinct() %>%
  left_join(postcode_pop) %>%
  select(Postcode, Population, SA2.name)
#> Joining, by = "Postcode"

out
#> # A tibble: 14 × 3
#>    Postcode Population SA2.name                                           
#>       <int>      <int> <chr>                                              
#>  1     4153         NA Belmont - Gumdale                                  
#>  2     4154         NA Belmont - Gumdale                                  
#>  3     4155         NA Belmont - Gumdale                                  
#>  4     4157         NA Capalaba, Sheldon - Mount Cotton                   
#>  5     4158         NA Thorneside                                         
#>  6     4159         NA Birkdale                                           
#>  7     4160         NA Cleveland, Ormiston, Wellington Point              
#>  8     4161         NA Alexandra Hills                                    
#>  9     4163         NA Cleveland                                          
#> 10     4164         NA Thornlands                                         
#> 11     4165         NA Redland Bay, Sheldon - Mount Cotton, Victoria Point
#> 12     4172         NA Murarrie                                           
#> 13     4178         NA Brisbane Port - Lytton                             
#> 14     4179         NA Manly - Lota, Manly West

Created on 2022-11-15 with reprex v2.0.2.9000

3 Likes

Thank you very much @scottyd22 - that worked perfectly. I learnt a lot from that bit of code and have been adapting it to other uses.

Thanks again!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.