There will never be a good postal address schema


November 27, 2022

Recently, a package I ordered was returned to sender. Rather annoyed (the sender was in a different country), we joined forces to debug the problem together and discovered that the automated fulfilment system removed the flat number and the building identification from the address I originally entered into the form. There was not enough information left for the postman to disambiguate the address and deliver the package.

Someone must have made some bad assumptions about address structure when writing the system. Well, we've all heard about Falsehoods programmers believe about addresses. The schema.org PostalAddress type gives up on any attempt to parse the street address, but they do run in some painful assumptions around postal codes instead.
Will we ever settle on a perfect address schema?

Since some address elements repeat often - city and street names, for example - it is tempting to attempt to enumerate all the properties that might show up in an address, and make an extremely broad and accommodating type. However, I don't think this will ever work well.

We will never be able to define an address schema because an address is a sentence of a specialised language that humans in a certain region use to identify a particular location. Just as with normal languages, the address follows a certain grammar, which is mostly easy to identify. It is therefore tempting to codify this grammar. However, languages evolve to match the needs of the community. Eventually, someone writes "would of" and crashes the parser.

We would be better off treating addresses as human-readable labels and creating a unique location identification system. Such a system would, of course, require a DNS server equivalent, translating human-readable addresses to unique storage locations (flats, PO Boxes, etc) and vice versa. What, then, are our options?

  • UK postcodes come close; they identify a neighbourhood and often correspondto a range of physical addresses
  • lat, lon with enough precision can pinpoint a location, but it does notaccount for multi-storey buildings, so you still need to add a flat number
  • what3words has the same problem as lat,lon

Side note: why not just use the addressee's name to disambiguate the flats? There are multiple snags:

  • the person might be renting, so they won't be on the flat owners' list
  • as above (renting) and the country may not have an obligation to register your living address
  • the person might not live at that address at the moment or at all; their friend is acting as a dropbox for them as a favour
  • the person might have changed their name in between the package being sent and delivered
  • the person might not want to be listed for personal safety reasons

So, what do? I don't know. Maybe we need a CosaNostra Pizza to solve the problem once and for all.

In the meantime, this would accommodate all address schema issues:

  1. allow address entry through a "typical usecase (e.g. local) address schema" form with non-blocking warnings and postcode lookup if such a thing exists in your country, to gently incentivise people into including all the standard elements in the right order (sadly, that doesn't guarantee they will enter the right thing in the right box),
  2. include an optional textbox for addresses that don't follow the standard schema (similiar to a textfield "other" under a list of options)
  3. store the address as a text blob, even if it was entered through the form
Tags: schema grok