The past few days have been a little rough. To begin, all KernelTrap configuration files were moved into a configuration management system, but in doing so many of the configuration files inadvertently got modified. These issues were fixed as they were noticed.
Then, Monday I rolled out the new Quotes feature, and the sudden surge of traffic in the mail archives was more than our server could handle. Currently sharing web and database traffic on the same server, MySQL is forced to run with minimal RAM -- something the extra processing required by the mail archives stressed to the breaking point.
We experimented with some MySQL tunings, aiming to allocate MySQL the most possible amount of RAM without swapping. This helps, but with limited RAM it can only do so much.
I then quickly wrote a caching layer for the mailarchives -- much of the queries are very expensive, but shouldn't have to be repeatedly performed (especially for older threads that aren't changing). I deployed a first draft of the caching code this morning, which noticeably reduced the load. Of course, it's a little rough around the edges, and in particular I need to work on cache expiration. One step at a time.
Finally, I temporarily disabled the most expensive pieces of the mailarchives, searching by subject and from address. I will re-enable this once my caching layer is updated to cache these queries too.
Unfortunately this alone hasn't been enough, and we're still seeing some big hiccups. I'm continuing to dig in, trying to isolate what has changed and what is the cause of these continued failures. Oregon State University's Open Source Lab hosts the KernelTrap server, and they have also been quite helpful in this effort.
Sorry for the continued problems, but expect things to return to normal again soon.