UK choropleth map challenge - postcodes in 3-4 digit format

Hi,
I have following challenge.
I have a list of shops which scores should be shown on the UK map (high in dark colour and low in a light one):

postcodeinfo <- data.frame(
  stringsAsFactors = FALSE,
          PostCode = c("BB7","BB1","PR5","CA3",
                       "DG12","CA15","CW1","ST7","CW9","PR4","PR2","PR7",
                       "ST17","ST18","ST16","ST3","ST4","TF9"),
             Score = c(0.717647058823529,
                       0.761194029850746,0.4375,0.777777777777778,0.764705882352941,
                       0.727272727272727,0.807017543859649,0.830769230769231,
                       0.868421052631579,0.819672131147541,0.732673267326733,
                       0.741379310344828,0.811764705882353,0.761194029850746,
                       0.844827586206897,0.68,0.855263157894737,0.857142857142857),
              Shop = c("AAA","AAA","AAA","BBB",
                       "BBB","BBB","CCC","CCC","CCC","DDD","DDD","DDD",
                       "EEE","EEE","EEE","FFF","FFF","FFF"),
              Cor1 = c(53.874,53.756,53.731,54.907,
                       54.99,54.712,53.103,53.088,53.259,53.754,53.778,
                       53.645,52.789,52.814,52.813,52.981,52.995,52.897),
              Cor2 = c(-2.386,-2.462,-2.656,-2.939,
                       -3.251,-3.481,-2.434,-2.265,-2.501,-2.833,-2.708,
                       -2.652,-2.099,-2.081,-2.118,-2.122,-2.183,-2.469),
   PostCode2digits = c("BB","BB","PR","CA","DG",
                       "CA","CW","ST","CW","PR","PR","PR","ST","ST","ST",
                       "ST","ST","TF"),
            Region = c("North West","North West",
                       "North West","North West","Scotland","North West",
                       "North West","West Midlands","North West","North West",
                       "North West","North West","West Midlands","West Midlands",
                       "West Midlands","West Midlands","West Midlands",
                       "West Midlands"),
              City = c("Blackburn","Blackburn",
                       "Preston","Carlisle","Dumfries","Carlisle","Crewe",
                       "Stoke on Trent","Crewe","Preston","Preston","Preston",
                       "Stoke on Trent","Stoke on Trent","Stoke on Trent",
                       "Stoke on Trent","Stoke on Trent","Telford")
)

As you can see, all geo info is already there. Now I need to colour all postcodes on the map.

Additionally, I would like to add a pin to these specific postcodes (best shops):
BB1 3HT (AAA group)
BB1 2DY (AAA group)
CA3 0HA (BBB group)

Is is doable?
My previous work cannot be applied as I used more general, two digit post codes :frowning:

library(ggplot2)
library(rgdal)
library(maptools)
if (!require(gpclib)) install.packages("gpclib", type="source");library(gpclib)
gpclibPermit()  # Gives maptool permisssion to use gpclib

#Download UK postcode polygon Shapefile
 download.file(
 "http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
 )
unzip("postal_shapefile")

# Read the downloaded Shapefile from disk
postal <- readOGR(dsn="C:/Users/.../GeoInfo/UK-postcode-boundaries-Jan-2015/Distribution", layer="Areas")

# Assign each "region" an unique id
postal.count <- nrow(postal@data)
postal@data$id <- 1:postal.count

# Transform SpatialPolygonsDataFrame to regular data.frame in ggplot format
postal.fort <- ggplot2::fortify(postal, region='id')
 
# Import data for each postal area
df <- read_excel("C:/Users/.../Scores by Post Code.xlsx", sheet = "Sheet1")

# Add "region" id to frequency data
df <- merge(df, postal@data, by.x="postal_area_code", by.y="name")

# Merge frequency data onto geogrphical postal polygons
postal.fort <- merge(postal.fort,  df, by="id", all.x=T, all.y=F)
postal.fort <- postal.fort[order(postal.fort$order),] # Reordering since ggplot expect data.fram in same order as "order" column

head(postal.fort)

ggplot(postal.fort) + 
  geom_polygon(aes(x = long, y = lat, group = group, fill=Score), colour='white') +
  scale_fill_gradient(low = "green", high = "red") +
  labs(title = "Total Score by region",
       fill = "Value (£)") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 20),
        axis.title = element_blank(),
        plot.background = element_blank(),
        legend.position = c(0.95, 0.70),
        plot.margin = margin(5, 10, 5, 10))

Not answering your query, but it would help you to use the proper terms for the UK postcode hierachy (areas, districts, sectors, units) because they are not fixed length:

Thank you. In this case I need Outward code such as BB7, ST18)

Do you have a specific technical challenge, or is this more how do I construct a map from scratch ?
I say this because you provided some example data (good) but then shared code which doesn't appear related (no mention of your chosen example data name) , and which itself is not runnable by your forum peers.

I would like to create it from scratch as my previous experience was related to different post code level ( Postcode area).
So it was example what I used but it is not relevant for 2-4 digit Outward codes

I am trying to use UK districts (Local_Authority_Districts_(May_2021)_UK_BGC) but I am not sure about it.
I must have following Cities on the source map:

Blackburn
Blackpool
Carlisle
Crewe
Dumfries
Lancaster
Liverpool
Preston
Sheffield
Stoke on Trent
Telford
Walsall
Warrington
Wigan

I have done this so far but used random numbers:

library(tidyverse)
library(rgdal)
library(broom)

#Load the shapefile - make sure you change the filepath to where you saved the shapefiles
shapefile <- readOGR(dsn="C:/Users/.../GeoInfo/Local_Authority_Districts_(May_2021)_UK_BGC", layer="Local_Authority_Districts_(May_2021)_UK_BGC")

#Reshape for ggplot2 using the Broom package
mapdata <- tidy(shapefile) #This might take a few minutes

head(mapdata)

ggplot(data=mapdata, aes(x=long, y=lat, group=group)) +
  geom_polygon(fill="grey", colour="#000000", size=0.25) +
  coord_equal() +
  theme_void()


mydata <- data.frame(id=unique(mapdata$id),
                     value=sample(c(20:100), length(unique(mapdata$id)), replace=TRUE)) %>% filter(id %in% c("1","2","10","33","77"))
mydata

df <- left_join(mapdata, mydata, by="id")

ggplot(data=df, aes(x=long, y=lat, group=group,fill=value)) +
  geom_polygon(color="#000000",size=0.25) +
   scale_fill_gradient2(low="red", high="green", na.value="white") +
  coord_fixed(1) +
  labs(title="Random values for now...", fill="Random", caption="A plot shows random values as full post codes cannot be linked...") +
  theme_void() +
  theme(legend.position="right")

I need to link it with my postcodeinfo data...

I doubt we can prepare a choropleth map using Outward post codes...
Perhaps using the smallest granularity of the UK (taken from here: Open Geography Portal : Wards (May 2021) UK BGC : Wards (May 2021) UK BGC (statistics.gov.uk) would help?

#Load the shapefile - make sure you change the filepath to where you saved the shapefiles
shapefile <- readOGR(dsn="C:/Users/.../GeoInfo/Wards_(May_2021)_UK_BGC", layer="Wards_(May_2021)_UK_BGC")

#Reshape for ggplot2 using the Broom package
mapdata <- tidy(shapefile) #This might take a few minutes

I am slowly loosing faith in R :frowning:

R as such is more than capable of performing the tasks needed to make the choropleth map you are after.

I think the reason you are not getting a solution is because you are asking people here to do all the work for you and that is unlikely to happen, we are more inclined towards helping you with specific coding problems rather than doing your work for you.

I understand that. I am only looking for a map source (UK) which would allow me to link my data with 3-4 digit post codes (called outward post codes).
This is the most important step.
Then we just need to link this map source with my data. I can do everything else...

Well, this part is not R related, maybe you will have better luck asking for map sources in a GIS specific forum.

I'm going to add the spatial tag to your topic to attract GIS experienced people that might be able to help.

1 Like

I could possibly use this German template as there are different levels of the map info (NAME_1, NAME_2, NAME_3):

library(tidyverse)
library(rnaturalearth)

scores <- data.frame(stringsAsFactors = FALSE,
                     NAME_3 = c("Berlin", "Hannover", "Dresden", "Frankfurt am Main", "Stuttgart"),
                     score = c(80, 70, 90, 100, 20))

germany <- ne_states(country = "germany", returnclass = "sf") %>%
  select(name)

# Download this file from the GDAM site
url_regions <- "https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_DEU_3_sf.rds"
scored_regions <- readRDS(url(url_regions)) %>% 
  filter(NAME_3 %in% scores$NAME_3) %>% 
  full_join(scores)

ggplot() +
  geom_sf(data = germany, alpha = 0.3) +
  geom_sf(data = scored_regions, aes(fill = score)) +
  geom_sf_text(data = scored_regions, aes(label = NAME_3), nudge_y = 0.3, color = "blue") +
  scale_fill_gradient(low = "red", high = "green") +
  labs(title = "Germany Scores",
       fill = "Scores") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 20),
        axis.title = element_blank(),
        plot.background = element_blank(),
        legend.position = c(0.95, 0.20),
        plot.margin = margin(5, 0, 5, 0))

and replace Germany in this code by United Kingdom:

germany <- ne_states(country = "germany", returnclass = "sf") %>%
  select(name)

but the problem I have is this part of the code:

url_regions <- "https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_DEU_3_sf.rds"
scored_regions <- readRDS(url(url_regions)) %>% 

as I would need the UK equivalent...

The answer is in the template, go to the GDAM site and look for the UK map

https://gadm.org/maps.html

Thank you :slight_smile:
I will try.
Final question for now:
How can I replace Germany by United Kingdom here:

germany <- ne_states(country = "germany", returnclass = "sf") %>%
  select(name)

I need to check whether this is "UK", United Kingdom", "Great Britain" etc. How can I do that?

https://cran.r-project.org/web/packages/rnaturalearth/vignettes/what-is-a-country.html

Thank you for all your help but even the third, most detailed level (NAME_3) is not too detailed co find my postcodes. Have a look please:



library(tidyverse)
library(rnaturalearth)

region.scores <- data.frame(stringsAsFactors = FALSE,
              NAME_3 = c("BB7","BB1","PR5","CA3",
                       "DG12","CA15","CW1","ST7","CW9","PR4","PR2","PR7",
                       "ST17","ST18","ST16","ST3","ST4","TF9"),
             Score = c(0.717647058823529,
                       0.761194029850746,0.4375,0.777777777777778,0.764705882352941,
                       0.727272727272727,0.807017543859649,0.830769230769231,
                       0.868421052631579,0.819672131147541,0.732673267326733,
                       0.741379310344828,0.811764705882353,0.761194029850746,
                       0.844827586206897,0.68,0.855263157894737,0.857142857142857),
              Shop = c("AAA","AAA","AAA","BBB",
                       "BBB","BBB","CCC","CCC","CCC","DDD","DDD","DDD",
                       "EEE","EEE","EEE","FFF","FFF","FFF"),
              Cor1 = c(53.874,53.756,53.731,54.907,
                       54.99,54.712,53.103,53.088,53.259,53.754,53.778,
                       53.645,52.789,52.814,52.813,52.981,52.995,52.897),
              Cor2 = c(-2.386,-2.462,-2.656,-2.939,
                       -3.251,-3.481,-2.434,-2.265,-2.501,-2.833,-2.708,
                       -2.652,-2.099,-2.081,-2.118,-2.122,-2.183,-2.469))
region.scores

library(rgeos)
UK <- ne_states(country = "united kingdom", returnclass = "sf") %>%
  select(name)

url_regions <- readRDS("C:/Users/sdanilowicz/OneDrive - MARQUE GROUP HOLDINGS PTY LTD/GeoInfo/UK/gadm36_GBR_3_sf.rds")


scored_regions <- url_regions %>% 
  filter(NAME_3 %in% region.scores$NAME_3) %>% 
  full_join(region.scores)

head(scored_regions)

ggplot() +
  geom_sf(data = UK, alpha = 0.3) +
  geom_sf(data = scored_regions, aes(fill = RetentionA)) +
  scale_fill_gradient(low = "red", high = "green") +
  labs(title = "Retention by region",
       fill = "Scores") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 20),
        axis.title = element_blank(),
        plot.background = element_blank(),
        legend.position = c(0.10, 0.20),
        plot.margin = margin(5, 0, 5, 0))

Maybe I should not look for postcodes but coordinates I managed to add to the file (Cor1, Cor2)?
Still hoping to get a solution :frowning:

You want to colour a map based on the "first bit" (3 or 4 characters, Outward code) of the UK postcode? And you have a map (http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip) with the full postcodes.

You could create a new identifier column in the map object (personally I'd be using sf to read and store the shapefile rather than rgdal etc) with Outward code (removing the second half of the full postcode in a mutate statement), then fill on Outward code, not worrying about the fact you actually have shaes for all full postcodes. Or having created a column of Outward codes you could then merge (union) within these. Isn't this what you're after? It seems all the info should be in the map you have.

Or have I missed something in my rather quick skimming of the thread?

Ron.

Ignore that. It isn't even that hard. With the caveat that I still might have misread the thread.

It looks like the Postcode shapefile zip contains 3 shapefiles with the Distributions directory: Areas, Districts and Sectors. Looks like Districts is what you want.

So something like this might get you to the point where you can start plotting using ggplot and geom_sf:

library(tidyverse)
library(sf)

#Download UK postcode polygon Shapefile
 download.file(
 "http://www.opendoorlogistics.com/wp-content/uploads/Data/UK-postcode-boundaries-Jan-2015.zip",
"postal_shapefile"
 )
unzip("postal_shapefile")

uk <- read_sf('Distribution/Districts.shp')

region.scores <- data.frame(stringsAsFactors = FALSE,
              NAME_3 = c("BB7","BB1","PR5","CA3",
                       "DG12","CA15","CW1","ST7","CW9","PR4","PR2","PR7",
                       "ST17","ST18","ST16","ST3","ST4","TF9"),
             Score = c(0.717647058823529,
                       0.761194029850746,0.4375,0.777777777777778,0.764705882352941,
                       0.727272727272727,0.807017543859649,0.830769230769231,
                       0.868421052631579,0.819672131147541,0.732673267326733,
                       0.741379310344828,0.811764705882353,0.761194029850746,
                       0.844827586206897,0.68,0.855263157894737,0.857142857142857),
              Shop = c("AAA","AAA","AAA","BBB",
                       "BBB","BBB","CCC","CCC","CCC","DDD","DDD","DDD",
                       "EEE","EEE","EEE","FFF","FFF","FFF"),
              Cor1 = c(53.874,53.756,53.731,54.907,
                       54.99,54.712,53.103,53.088,53.259,53.754,53.778,
                       53.645,52.789,52.814,52.813,52.981,52.995,52.897),
              Cor2 = c(-2.386,-2.462,-2.656,-2.939,
                       -3.251,-3.481,-2.434,-2.265,-2.501,-2.833,-2.708,
                       -2.652,-2.099,-2.081,-2.118,-2.122,-2.183,-2.469))

uk <- left_join(uk, region.scores, by = c('name'='NAME_3'))

Ron.

Thank you but my map is still blank (unless I am missing something). :frowning:
How can I see the content of the Districts.shp? Can I export that to csv?

You ran all the code I provided? and there is nothing in the uk object (which is the simple features stored district level map) after this?
Did you get any error messages?

Ron.

Once you've run the code, just type uk so the top of it is printed. uk contains the Districts shapefile and after joining to the data in region.scores should also contain Score, Shop, Cor1, Cor2, ready to plot (eg ggplot(uk)+geom_sf(aes(fill=Score),colour=NA)).

Ron.