Building an Open Address Database from UK Local Government Data: A Practical Guide

By

Overview

Open data enthusiasts and developers have long sought to create freely reusable address databases for the United Kingdom. While the Ordnance Survey (OS) maintains a comprehensive but proprietary AddressBase product, UK local authorities release address datasets under the Open Government Licence (OGL). This tutorial explores how to collate these local datasets into an open national address database, inspired by the work of Owen Boswarva – while also navigating the legal and technical challenges that arise, including potential conflicts with the OS. By the end, you'll understand the process, the pitfalls, and the broader implications for open data in the UK.

Building an Open Address Database from UK Local Government Data: A Practical Guide
Source: hackaday.com

Prerequisites

Before diving in, ensure you have:

Step-by-Step Instructions

Step 1: Identify and Gather Local Authority Address Datasets

Start by locating the address data published by county councils and unitary authorities in England, Wales, and Scotland. Many maintain a Local Land and Property Gazetteer (LLPG) or a similar register. These datasets often include:

Check platforms like data.gov.uk or individual council websites. For example, Buckinghamshire Council publishes a Street Gazetteer under OGL. Use web scraping scripts to download CSV files – but respect robots.txt and rate limits.

Step 2: Parse and Normalize the Data

Each authority may use slightly different column names and formats. Write a Python script using pandas to read CSV files, standardize headers (e.g., street_name, postcode, easting, northing), and handle missing values. Example snippet:

import pandas as pd

df = pd.read_csv('buckinghamshire_addresses.csv')
df.columns = ['street', 'postcode', 'easting', 'northing']
df.dropna(subset=['postcode'], inplace=True)

Convert coordinates to a common projection, such as WGS84 (latitude/longitude), for compatibility with other open data like OpenStreetMap.

Step 3: Collate and Deduplicate Records

Merge data from multiple authorities into a single dataset. The main challenge is deduplication: the same property may appear in multiple lists (e.g., county and district records). Use fields like unique property reference number (UPRN) if available – these are standard identifiers. Otherwise, group by street name, postcode, and property number. A script to identify duplicates:

duplicates = df[df.duplicated(subset=['uprn'], keep=False)]

For records without UPRN, apply fuzzy matching on address strings (e.g., using fuzzywuzzy library). Keep the most complete record.

Step 4: Validate Data Quality

Check for geographic consistency – e.g., ensure coordinates fall within the correct council boundaries. Use shapefiles from the ONS Geography Portal to spatially join and verify. Also check address line integrity: some datasets may list a street as 'Acacia Avenue' while others use 'Acacia Ave'. Standardize or keep a cross-reference.

Step 5: License Compliance and Attribution

Each dataset you use is licensed under OGL v3. This allows reuse with attribution. You must include a notice such as: Contains public sector information licensed under the Open Government Licence v3.0. Also note the source authorities. If any data is derived from Ordnance Survey (e.g., OS MasterMap), do not include it – OS data is not under OGL. This is the core conflict: councils sometimes incorporate OS intellectual property, and the OS claims that re-publication infringes their rights. Proceed only with data explicitly released under OGL that is also certified as OS-free.

Building an Open Address Database from UK Local Government Data: A Practical Guide
Source: hackaday.com

Step 6: Publish Your Open Address Database

After cleaning and validating, you can publish the collated dataset as a GeoJSON file, a CSV with latitude/longitude, or host it on a platform like GitHub under an open license (e.g., OGL or Creative Commons). Provide documentation explaining the data provenance, known limitations, and the legal basis. The case of Owen Boswarva shows that even careful collation can lead to legal threats – so consider consulting a lawyer or joining the Open Knowledge Foundation for community support.

Common Mistakes

Assuming All Local Authority Data Is Fully Open

Even under OGL, some datasets include third-party intellectual property, such as OS background mapping. Always read the licence header of each file. If it says “Contains OS data © Crown copyright”, the OGL does not cover that element. Owen Boswarva faced exactly this issue: councils released address data, but OS claimed the structure and derived coordinates were theirs.

Ignoring the Streisand Effect

The original article references the Streisand effect – when a legal threat backfires by drawing more attention. Attempting to keep data secret often amplifies the problem. Be transparent about your sources and intentions.

Lack of UPRN Usage

The Unique Property Reference Number (UPRN) is the gold standard for deduplication. Many councils omit it from public datasets; if you can obtain it from other open sources (e.g., the OS Open Names dataset – though that itself has restrictions), use it. Without UPRNs, your database will have messy overlaps.

Poor Coordinate Projection Handling

British datasets often use OSGB36 (eastings/northings). If you publish in that projection without transformation, global users (e.g., OpenStreetMap) will find inaccurate coordinates. Always convert to WGS84 for web use.

Summary

Building an open address database from UK local government data is technically feasible but legally fraught. By gathering publicly licensed council data, normalizing it, and carefully avoiding OS‑tainted elements, you can create a valuable resource. However, the Ordnance Survey may challenge your work, as seen in the Owen Boswarva case. Proceed with robust licensing checks, community support, and a willingness to defend open data. This tutorial serves as both a practical guide and a cautionary tale – the future of open data in the UK depends on how we navigate such collisions.

Related Articles

Recommended

Discover More

Homebridge 2.0 Launches with Matter Support, Ending Three-Year BetaCrypto Market Rebounds: Key Developments and What They Mean for InvestorsAutomating Documentation Testing for Open-Source Projects: A Step-by-Step Guide Using AI Agents10 Critical Strategies to Prevent a Single Click from Wrecking Your Network: The Patient Zero PlaybookWHOOP Introduces Doctor Video Consultations: Key Q&A