Tim Bray updated an old piece on binary search this morning — I missed it the first time around, so I was glad that it popped up in my blog reader. Tim’s taking some flak about data abstraction from people who don’t have his experience in high-performance environments, but what got my attention most was his mention of using gaps in a long array to provide efficient updates.
It turns out that this technique, called a gap buffer [wikipedia], is one of the cornerstones of text editors like Gnu Emacs. I’ve been using Emacs for 20 years and have contributed to the main distribution (see derived.el
), but never bothered to look at the C code long enough to discover this particular technique. There’s surprisingly little information online — if anyone’s ever bothered to do testing for the optimum gap size, etc., it’s not showing up in Google — but it’s still nice to experience the joy and excitement of a new (to me), simple algorithm that solves a common problem well.
Does anyone have pointers to more detailed research on gap buffers? It seems to me that they’d have applications far beyond text editing, including (perhaps) storing compiled tree data (aka binary xml) on disk.
You might enjoy Simon Tatham’s “An Efficient Data Structure For A Hex Editor”:
http://www.chiark.greenend.org.uk/~sgtatham/tweak/btree.html