Posted On: 2010-11-01 18:17:13
Ooo... Shiny Thing
ShmooCon ticket sales.... For those that have watched our 0wn the Con talks in years past, you know that the ticket sales process has been something we've struggled to get right. The team spends a lot of time learning from what went right and what went wrong and trying to do it better the next time.
After several years of extending our initial system (same basic systems from ShmooCon 2 with lots of upgrades) we decided to do a ground-up rebuild. One of the primary issues the team focused on was the queuing to make purchasing more fair. It's actually pretty complicated to fairly sell a limited number of tickets when people can buy 1 OR 2 tickets and there are more buyers than tickets. At the end of it all, the system we have now is highly customized to our selling and redemption process and should suit our needs very well.
After the code was written, we did some load testing based on last year's numbers from the Dec 1 sales run (typically our most aggressive). We had some decent stats and were able to do what we thought was a good approximation of last year's demand.
Unfortunately, two things happened. First, last year our main website was a static site that we updated from a code repository. This year, it's PHP. So the load on the server before we even turned on sales increased dramatically. But the real issue was the combination of PHP with our Apache configuration. Apache has some tuning parameters to try and prevent itself from freaking out when it gets loaded. We basically pushed our Apache config so hard that rather than queuing requests, it just dropped them instantly to save itself. Ooops. :( The bottom line: rather than failing gracefully under load, Apache basically pulled its head inside its shell and pretended it wasn't there.
We tuned Apache quite a bit this afternoon (really, we started tuning apache a few minutes after noon when we realized what was going on). Honestly, the number of ppl who hit the site at 1pm for an update was greater than the number that were hitting the site 2 mins before sales were suppose to start at noon. And the webserver was up, responsive, and using much less memory and CPU. We've made a few minor changes since then based on further analysis and are continuing to look at the data.
So the lesson learned here is we spent a lot of time and effort on the hard CS problem at the core of selling tickets in the manner we do. Unfortunately we took the ball off sysadmin / operations side of the process. Now, we have a fighting chance of getting this done right. On the off chance we still need capacity, our new architecture allows us to scale across multiple boxes. However, that requires more infrastructure and more tweaks and more places for failure. We'll take our chances on the current hardware :)
Also, everyone always asks why we do this ourselves. There are a variety of reasons. The big one is when you pay someone else to handle ticket sales, they take a cut of the revenue for themselves. For example, EventBrite would cost us $9.50/ticket on a $150 ticket. Losing 6% off the top of our budget before we can even use it is a bummer. Plus, frankly, it's a heck of a learning experience.
Anyhoo, we're in the process of stress testing our changes. We're going to really push the box this time using the information we gathered today. We'll post another update tomorrow morning. We hope to have a new time for the first round of ticket sales. Again, you will have at least 24 hours between the time we post and the time sales start. That should give you enough time to tape down F5 again.
Share this post:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | Next ->