Why did we take reddit down for 71 minutes?
Thursday, January 7th, 2010http://blog.reddit.com/2010/01/why-did-we-take-reddit-down-for-71.html
Part of our setup uses what we call a “permacache”, which uses Memcachedb. Memcachedb is Memcached with a built-in permanent storage system using BDB. One of the “features” of this system is that it saves up its disk writes and then bursts them to the disk. Unfortunately, the single EBS volumes they were on could not handle these bursting writes. Memcachedb also has another feature that blocks all reads while it writes to the disk. These two things together would cause the site to go down for about 30 seconds every hour or so lately.
