SQL and simple polymorphism

SQL is about tables and sets. Object-oriented programing is about types, subtypes, and polymorphism. There are nasty, nasty frameworks available to try to move between the two worlds (say, by serializing and deserializing Java objects) — this posting, however, is not about those. Instead, it’s about how to make SQL support the basic benefits of polymorphism in what we might call a SQL-y way, independent of any programming language.

Similar entities

Imagine that you want to deal with a set of geographical entities, similar to what I handle in OurAirports:

  • cities
  • airports
  • navigation transmitters
  • points of interest (e.g. buildings, etc.)

The normal object-oriented approach would be to create a base class such as “Location” with all the properties these have in common (unique id, latitude, longitude, elevation, name, description, etc.) and then create subclasses adding additional information for each one (such as frequencies for the navigation transmitters, or identifiers for the airport). The normal naive SQL approach, on the other hand, is to repeat the information in four separate tables.

The base table

SQL can, however, handle polymorphism in a fairly elegant way — I think database people call these “disjoint subtypes”, but feel free to ignore that term. Instead of a base class, we create a base table (I’m using the MySQL dialect, but most SQLs should have similar capabilities):

create table Locations (
  id bigint primary key auto_increment,
  type enum ('airport', 'city', 'poi', 'navaid') not null,
  latitude_deg double not null,
  longitude_deg double not null,
  elevation_m int not null,
  title varchar(128) not null,
  description text not null,
  # add indices as needed
);

(A purer implementation would use a separate LocationTypes table for flexibility, rather than the enum value — I’ll probably do that in OurAirports, but it would make these examples a bit more verbose.)

The subtype tables

Next, create a table for each of your subtypes, where the primary key is also a foreign key pointing to the Locations table:

create table Airports (
  id bigint primary key,
  iati_code char(3),
  icao_code char(4),
  local_code varchar(4),
  # etc.
  foreign key id references Locations(id)
);

Using the same id for the base and derived table ensures that each entity has an id that is unique across all subtables, and simplifies SQL statements for accessing and modifying the tables.

Polymorphism

Now, when you want to deal with locations in general, regardless of subtype, simply query the Locations table:

select L.* from Locations L where latitude_deg > 60;

You can update base information without even knowing the subtype:

update Locations set latitude_deg=45.5 where id=111;

When you want to deal with a specific kind of location, do a simple join:

select L.*, A.* from Locations L 
join Airports A on L.id=A.id
where A.latitude_deg > 60;

Views

To make this even easier, define a view containing the join:

drop view if exists AirportsView;
create view AirportsView as
select * from Locations L 
join Airports A on L.id=A.id;

Now you can simply query AirportsView as if it were a single table:

select * from AirportsView where latitude_deg>60;

Inserting and deleting

Inserting is slightly more complex, because it involves two steps: first, create the base entry, then link to it from the derived entry:

insert into Locations 
 (latitude_deg, longitude_deg, elevation_m, name) 
values
 (45.5, -75.5, 100, 'Sample Airport');

insert into Airports (id, iata_code) values
(last_insert_id(), 'AAA');

Stored procedures and/or triggers can automate this fairly nicely, if you don’t want to have to embed too much database logic in application code. Deletion similarly requires two steps:

delete from Airports where id=111;

delete from Locations where id=111;

A simple post-deletion trigger for Airports, though, could automate this and avoid the risk of mistakes in source code.

Eating the dogfood

Unfortunately, I didn’t design OurAirports this way from the start, so I have a bit of refactoring to do. When I’m finished, however, I expect mapping to be much simpler, since I will no longer have to execute 5 or 6 separate queries to find all the icons to display on a map. If anyone has any better ideas, please let me know; otherwise, stay tuned.

This entry was posted in General and tagged , , . Bookmark the permalink.

8 Responses to SQL and simple polymorphism

  1. Francis Avila says:

    This technique is also called “joined table inheritance” and I’ve used it to good effect in fairly large projects. Unfortunately it’s very awkward to use with most ORM systems since it breaks the active record/active table pattern of one class per table and one row class per table row.

    SQLAlchemy (python library) has native support for this pattern (and two other sql “inheritance” patterns) and makes working with them a dream.

    I constantly long for something of SQLAlchemy’s expressiveness and power for php (which is 90% of my day job), but all the php ORMs (such as they are) appear to be active record patterned as well, vs the data mapper pattern SQLAlchemy uses where “objects” in your app are decoupled from tables, and you can establish flexible mappings between the two.

  2. stand says:

    Another trick a friend clued me into for inserts. Insert into the base table last. This gives you an effective transaction across all your base and derived tables inserts because if you insert into a derived table and then fail to insert to the base table, you just have an orphaned record in the derived table which won’t show up in your normal queries. You can be assured that once the base table insert completes, you have a full record. Also, from a concurrency standpoint, your record doesn’t show up to other reading threads until the base record is inserted and the full object can be populated.

    For deletes, you remove the base table record first and then clean up the orphaned derived records.

    • stand: interesting idea, but if you want a unique id across all the subtypes, don’t you have to assign it in the base table first? Of course, if you already have your primary keys (e.g. employee numbers), then it’s not an issue.

      • stand says:

        Yes, I am assuming that the id is an input parameter to the insert operations. You’d have to use a GUID or some other id generating strategy.

  3. John Cowan says:

    If your database unique IDs are just incrementing numbers, then yes. If they are truly unique (e.g. UUIDs), then you can safely create the derived table entry before the base one.

  4. The only problem with this is that a clueless new employee might add a place with no subtype, or with more than one subtype.

  5. Ribaldos says:

    This is no polymorphism. It would be when “Locations” had a (virtual) column which is dispatched to different implementations (e.g. retrieving columns from the child table).

Comments are closed.