Posted On: 2010-11-04 18:24:24

First off, we want to thank everyone for the response we've received today.  There's been a lot of positive feedback and we really appreciate everyone being understanding of our situation.  We're working our butts off trying to find a solution that gets tickets sold and everyone back to their normal lives.

As far as what we've found technically... we think we bumped up against socket limits on FreeBSD.  We were running the machine with maxfiles at ~12800 and tcp.msl at 30000 (ie: 60 seconds).  maxfiles is one of the variables that controls the number of open sockets on the box.  We were seeing that both today and Monday, when we hit ~225 new connections a second, the box would freak out.  Some quick math:

225 new connections * 60 seconds (OS timeout for TCP sessions... even after they close) = 13,500.  

That's more attempts to open sockets than we had sockets available.  This explains why our load actually _decreased_ when we hit the wall.  It also explains why apache was still servicing some but not many connections (it was only picking up sockets as they expired).

We cranked up the number of available sockets and cranked down the timer, so I think we've got that problem.  

That said, we now have a ton of information about how many ppl were trying to buy tickets and the rates they were hitting us with.  We're estimating about 1,300 people were hitting the webserver during the attempted sales.  We assume most of them were trying to buy tickets and not just window shopping ;) The numbers we got today are going to be useful to continue to load test.  We're going to take some time, regroup a bit, do some more testing based on new numbers, and get back to you.

