Greedy Goblin

Thursday, May 17, 2012

Dealing with game server overloads

Diablo 3 start wasn't good for us at all. Blizzard choose a "hands off, God will separate them" approach to handle server over load: players spammed the login screen, some were lucky, some were not. I was lucky, got in in 10 minutes. My girlfriend, who logged in the same minute from the same router was not, and couldn't log in Tuesday at all. So I was walking around in Tristram waiting for her while reading blogs and collecting Cynosural Theory skillbooks in EVE (not bad profit for Alt-Tab once a minute and press one button). In the meantime I was thinking how could one handle the server overload more properly.

The obvious idea: "buy bigger servers" is obviously stupid. As a comparison, EVE has 350K subscription and I've never seen Tranquility population above 45K, while EVE is typically a hardcore game, unlike Diablo 3 that someone can play with much less time without missing out on events or stories. So I'd say after the first 1-2 months the ratio of concurrent logins during peak time vs total games sold will be below 10%. Of course on launch week people want to play much more, so it's expected to rent some extra servers to support the extra population for the first two months, but to rent enough servers for everyone would be some serious waste of money. We are talking about tens of millions of $.

The second obvious idea is waiting in queue instead of randomly getting in and out. It's even worse as it motivates players to make more server load. The best way to get in is to log in earlier and to not log out unless you really must, since you might can't come back. The result is even more server load, on the top of being very unfriendly to casual gamers who don't know such tricks.

Then I figured out the solution and I hope that it somehow reaches the ears of future game developers. I know that some Blizzard guys must read me as they responded very fast for the WGClean idea. At first it needs a login queue, which alone would be 6+ hours and I've just written that queues are bad. The idea is whenever the server is above 90% capacity, a timer starts to roll for every player logged in. This timer measures how much time you spent playing during "peak hours". On launch day probably every time is peak time. When the "approximated wait time in queue" reaches 10 minutes, the server starts to make room by sending a message to the players who has the longest timer: "You played [timer] time while others couldn't log in, it's fair to let them play. You'll be removed from the game in 5 minutes, place your hero out of harms way." A countdown starts and if the player don't log out in 5 minutes, he is kicked, making room for someone who played less. Regardless timer, the game shall always provide one hour uninterrupted play, so if you could log in, you can only get this message 55 minutes after that. The timer is counting of course.

The timer never resets, but it's irrelevant when there is enough server capacity. If someone with timer tries to log in during peak hours (which he can do instantly after being kicked to make room), he is placed to the queue but the queue is sorted according to the timer, with one trick: time spent in the queue decreases the timer.

Example: the game starts at midnight, with 10 slots, 19 players want to play. They are called P1,P2....P19
00:00 the game starts P1-10 get in, P11-19 wait in queue. Since the server is above 90% load, P1-10 all gather timer.
00:55 P1-9 get the "you'll be kicked in 5 minutes" message
01:00 P1-9 are removed, P11-19 gets in.
01:01 P1, P2, P3, P4 and P5 requeued. Since there is queue, P10 gets the message.
01:06 P10 is removed, P1 is back
02:30 P20-29, new players enters the queue. Since they have no timer while P1-5 still have 31 minutes left, they get to the top of the queue
01:55 P11-19 gets the message
02:00 P11-19 removed, P20-28 enters
02:01 P1 gets the message
02:06 P1 is out, P29 is in.

Why is this system optimal? Not only because it distributes the limited resource fairly, but because it motivates people to play in off-hours when they gather no timer. It also motivates them to log off during overload when they don't really want to play.

As a bonus, this system would allows the game company to sell one time timer nullifier service and premium accounts that don't gather timer.


PS: the EVE developers welcomed the terrible start of Diablo 3 with the following joke:


Diablo 3 business report: none. I see no point doing business yet, time best spent growing up. Buy gear in the AH, it's cheap like crap. From the gold a zone provides, you can buy better gear to every slot than the pieces you can find in the same zone. Sell everything at the vendors except really-really good items. Essences are cheap too.

EVE Business report: Thursday morning 22.4B. (0 PLEX behind for second account, 0.9B spent on triage carrier alt) Don't forget to join the goblinworks channel to discuss trading and industrial ideas and laugh on the morons of the day (50-80 people on peak hours).

22 comments:

Samus said...

I'm afraid this is one of those times where I don't think you can think like a social, Gevlon.

To the social, long login queues are "not Blizzard's fault," too many players logged in. Imagine Blizzard as a raid leader that won't do anything about the bad DPS after wiping due to enrage timer.

Being kicked by the game definitely IS Blizzard's fault. Now imagine Blizzard as the goblinish raid leader who boots the lowest DPS after a wipe.

Maybe the second raid leader is more fair, but the socials don't see it that way. All the socials tell that first raid leader "it was just bad luck lol," but they all hate that second raid leader for being a jerk.

It might be annoying, but Blizzard is smart to do nothing about login queues.

Anonymous said...

If you actually think the idea is worth it you can hire a programmer to test your idea, should be easy to Mock a login queue.

chewy said...

It's not a bad idea but I don't think it considers all of the issues.

You've effectively matched a person in the queue with a slot in the game (albeit currently occupied). This works until the number of people in the queue becomes larger than the number of slots in the game and then you can't hand out an "hour slot allocations". Consider your example for the 10 slots but with P1 - P30 playing or queuing.

Once you've reached this critical mass and assuming people ejected simply put themselves back in the queue it will continue to get worse and the queue will get longer than an hour.

When might the queue get longer than the available slots ? At the launch of a new game for example.

Kreeegor said...

Or just buy enough temporary capacity from rackspace or Amazon EC2 and release it in two weeks. There are problems solved better with existing technology. Oh and here come the fun part - if diablo had proper offline mode this would have been never a problem right.

Gevlon said...

@Chewy: the idea cannot make the queue shorter. No idea can (besides buy more slots).

The point is that instead of a few lucky playing all day and everyone else sit in the queue all day, everyone plays a little.

Foo said...

A slightly different option.

Collector edition players get queue priority.

Also I see nothing at all wrong with a 'during peak periods - time logged in will be rationed'.

However, I have a different problem. If I want a multiplayer game, I accept that I must be logged onto a net somewhere.

I am content that if I have no net, I can't connect to others who are on the net.

However, if I am on a single player game I see no benefit in being on the net. Others at work experiencing lag on a single player game? Really? Just so we can have an AH. No thanks.

But then I have not deliberately purchased any 'must be on to play single player game'; and only have diablo 3 as part of the annual pass.

chewy said...

@Gevlon - Yes, I agree, the only solution is more slots which is why Kreeegor has the correct solution. It's called "cloud bursting" and is designed exactly for peak work loads to provide more slots.

Your idea increases throughput by clearing the slots but has a diminishing return on the wait time/play time ratio.

Anonymous said...

While there was a small issue with servers being full, it seemed as though the hard part was getting through the Battlenet login bottle-neck. Diablo had Error 37 and let people through at random. WoW took a long time to connect to the server but people got in because it was queued.

5 minute kick warnings would be a pain for a game like Diablo where your progress is only saved when you cross a check-point. I'd modify your suggestion so the warning wasn't 5 minutes, but until they next crossed a save point (doesn't stop farmers though).

I would be much more devious in my kickings.

Give no notice and kick, then blame a "server bug." Then announce it as fixed once the rush has passed.
Alternatively, I would release it as a "parental feature" that limits play sessions. After 1 hour, a health warning appears in the corner of the screen warning that users will logs off when they next save. When the rush is over, announce an improvement to the parental features allowing you to turn off the season play-limits feature.

Dàchéng said...

Diablo 3 is a single-player game with a few online features. Solve the problem by allowing players to play their single-player game offline when the loging servers are busy.

shamus said...

Bear in mind that there is a difference between login servers and game servers. There's a good chance the latter had capacity just that people couldn't get to it due to the bottle neck of the login servers.

I mentioned this to a network/security friend of mine. He pointed out that when login servers get overloaded like this there can be a feedback effect as people (and the software) retry. The load can oscillate wildly in a nice sine wave pattern and sometimes take hours to die down again even once the load condition has gone!

Anonymous said...

People who can't get into game have already paid Blizzard their money. There is no subscription fee, so Blizzard won't be worse off if someone who already bought Diablo quits the game. Why bother with "fairness"?

Gevlon said...

@Shamus: if blizzard is letting that happening, they are really dumb. The proper way to handle that is to have a pre-login server. The client first turns to that server with a simple message "client from X.Y.Z.W want to connect". No confidential data, no encryption is needed. The server either responds "OK you can go to the login server now" or "Login server is busy, try later".

@Anonymous: player who quits in anger won't pay in the RMAH, won't recommend the game to his friends and less likely to buy anything from that publisher ever again.

Anonymous said...

If Blizzard were to have pre-login servers, wouldn't it be a better idea to use these servers to increase the capacity of the actual login servers?

Also, interested why you think that WG Clean was broken by Blizzard due to them reading this blog and not in game reports.

Bristal said...

Please. Release day logjams are a badge of honor to hardcore gamers. Getting in is like winning the lottery. You don't go to a lottery winner after he's spent a few bucks of his winnings and say, "sorry, time for someone else to spend some!"

Waiting in a queue at 1am is bothersome, but getting in and then getting kicked by the game so someome else can play? I sense marketing and player relations aren't your forte, Gevlon.

george said...

What Chewy said, buy/rent room on something like Amazon's EC3 Cloud service. Put together a virtual server image for hosting a block of Diablo 3 games and then just propagate more virtual servers on the cloud as you need them. It'll be more expensive for those first few days or maybe weeks but that service has been proven to withstand massive amounts of traffic. Hell, Anonymous wasn't able to make it even break a sweat when they tried to DDoS it.

Anonymous said...

@Anonymous,

The reason for the pre-login servers is that they can handle a much larger capacity of users than a login server could. You're not running password hashed, performing decryption on the transmitted credentials, or any other of tasks associated with logins that are a bit more CPU intensive that just checking the current load on a server.

You would be able to handle a significantly larger number of potential players trying to log in without causing them to mash retry attempts against the servers causing a DDOS.

Anonymous said...

A better idea may be staggered launch dates. East coast week one, West coast week two. If I buy a game I want to play it. Not get an hour slot and then wait like I have nothing else to do the rest of the day. In any event, cost of the rented servers is factored into the cost of the game based upon how many games are anticipated to be sold. It is a cost of doing business. To launch a game with a huge user demand and not prepair for it by under investing in capacity is poor customer service. Blaiming or penalizing the consumer is not the answer.

Caramael said...

I'm not sure what CCP is trying to say here. Their servers can handle 300k players. Great.
Now please let us know how your servers handle millions of players logging in at the same time. Oh wait, that's never going to happen, especially with an attitude like that, nevermind.

Anonymous said...

Just take it to the nth degree. When you login, you get one hour to play, then get dumped to the back of the queue. Really no different than going on a ride at disney. You wait in line, see the content, then get dumped back to the end of the line. Good idea though.

Anonymous said...

Without commenting on your idea because it's not about any specific game, Diablo 3 would have had a very easy solution: Have no always-online DRM. Probably the majority of players would have played single player, so that's half of the server load already taken away. It boggles my mind why Blizzard had to employ such consumer-unfriendly tactic without even making 100% sure that there wouldn't be problems.

Stumblebeard said...

I work in big IT - there are many companies that sell temporary services that would resolve these issues easily - from extra server capacity to extra bandwidth.

The company I work for does a very large amount of their business around certain dates and we very frequently add capacity like this. It is not vey expensive either. Think tax day, end of month or quater.

It would be easy to pay for to. Just come out with staggered pricing. Like 1 dollar more for the game the first two weeks it is out. That simple.

Soge said...

Who can say it wasn't some kind of DDOS attack either? I wouldn't put it past some other companies/hacker groups to do so, and given the already immense server load it would be extremely hard to identify the source. Also, when you consider just how large this launch was, and that it is Blizzard, it would generate enough negative news/tears/lulz to justify the act.

Anyway I didn't have any issues after the first hour - I even logged out some times during that day and got back in fine.