Wednesday, March 27, 2013

Handling server overload

TiDi is a bad answer for a problem in EVE: as players are usually distributed among the various star system groups (servers), providing low load. These servers are unable to support if thousands of players happen to jump on one system. Such events are rare but crucial moments, shaping EVE. TiDi tries to handle this with slowing down time locally, therefore decreasing the number of actions the players can perform, decreasing server load.

TiDi is bad because it allows players who are not in the besieged system to converge there. Let's say a titan is tackled by a fleet that can destroy it in 5 minutes. Only those within 5 minutes of travel time can arrive, increasing the burden of the server. However a 10% TiDi would allow everyone within 50 minutes range to arrive, making sure the system is full off observers, killmail whores and such, making the battle unplayable for those who are actually involved. Besides increasing the server load, TiDi also changes the outcome, favoring the side with larger reinforcements, practically removing large scale surprise attacks from the game.

I don't even mention the "Captain Obvious" solution: stronger servers, as they are obviously expensive. Keeping every constellation 24/7 on a server which can support an Asakai sized battle is a huge waste of money. Refusing this solution however accepts that the server can't serve everyone, in the crucial moments, leading to TiDi and the problems above.

My solution would be selective service: when TiDi reaches 75%, the system goes to "yellow mode". In "Yellow mode" frigate sized ships, shuttles, noobships, destroyers and pods can't enter the system, nor they can undock from a station inside the system. Also, those of them who are not already in combat (PvP flagged, targeting or targeted by someone PvP flagged) are logged off. When the player tries to relog or undock in such a ship, he is offered the same window we see in Jita: that offers him to magically transported to a nearby system. Fleets would be granted 1 protected slot for every 10 ships that are not "yellow-banned" themselves, so a full fleet can have 24 tacklers, warpins, cynos that can operate under yellow mode. The technical way would be a protection priority list that the FC sets and the first N ships get protection.

Yellow mode would also force drone grouping: Drones of the same type of the same ship would fly to the same spot forming a "drone group" which is a single drone for the server with corresponding damage increase (so you'd control 1 fighter-bomber that does 20x damage instead of 20 FBs). These grouped drones would have two HP bars, one for single target damage, one for smartbomb. If the single target HP reaches zero, the group becomes smaller with full HP and targeting on this group has 1/N chance to break, representing the chance that your targeted drone died (if you had a 20x FB group and a ship killed one, you have a 19x FB group and he has 1/20 chance to be forced to retarget). If smartbomb HP reaches zero, the whole group dies.

The point is to remove the ships that are likely have little to none effect, letting those who actually affecting it operate with minimal TiDi. If TiDi disappears for 5 minutes, yellow mode is cancelled.

If yellow mode is not enough and TiDi is still below 75% for 5 minutes, "orange mode" is invoked. In this mode "yellow-banned" ships are disconnected and immediately disappear from space even if they were in combat, and T1 cruisers can't enter, undock or remain unless already in combat. The protected spots for small vessels is both recalculated (you can't get a spot after T1 cruisers) and decreased to 1/15 (16 small ships for a 255 man fleet). In orange mode subcapitals can only control 1 drone group, capitals 2, supercarriers 4. So if a subcap loses 2 drones (his drone group is down to 3 members), he can't send out a second group, must recall his group and resend it full.

If even orange mode fails and TiDi is still below 75% for more than 5 minutes, red mode is invoked. In red mode everything is instantly logged off and disappearing from space except capitals, battleships, strategic cruisers, logistics, dictors and command ships. Protected positions are recalculated and decreased to 1/20. All ships are limited to 1 drone group/ship and only heavy drones, sentries, fighters and FBs can be used.

This way the server could continue to operate without heavy TiDi with little change of the event. A single black-screened titan or the fact that a dread fleet arrived from the other end of the galaxy has larger effect on the outcome than removing 500 frigates and T1 cruisers, so it's just logical to keep the battlefield clear of litter.

Please spare me from the "everyone has the right to be there" comments. The TiDi is there because the server cannot serve everyone. We can only decide who should be not served.

PS: spare me from comments that point out that this or that speciality ship (like bombers) are important. Yes, I'm bad with PvP ships, so my example list is bad. But other people are good. The point of the post is to make these people a priority list of ships and as the server can't handle the load, kick out the low priority ships.

29 comments:

chequers said...

You're pointing out an accurate distinction here: 10% of fleet members have 90% of the impact on battle outcome. But you misunderstand CCP's goal. It's not to minimise the impact on the battle outcome, it's to keep their customers happy.

And what would be better for customer happiness? A couple of dead titans, or 500 players who get kicked out of the most exciting battle in months?

Anonymous said...

As a player, why should he/she accept that current servers can't handle everyone and be willing to sacrifice his/her right "to be there"? Why can't (or shouldn't) players demand better servers from CCP?

Gevlon said...

@chequers: it sounds nice, but that 10% doesn't only control the outcome but the battle itself. For example Goons declared not to send capitals to unreinforced systems, practically making it impossible that a capital battle happen without some timer.

What is better? 1000 customers having fun and good press at the cost of another 1000 kicked out of the system, or no battle at all?

Anonymous said...

The servers used for large scale fleet fights (reinforced nodes) are up-to-date high end machines.
And even those get forced into TiDi.

One of the problems is, that CCP cannot dynamically relocate solar systems to these machines. The system has to be shut down and started on one of those reinforced nodes.
So if large scale battles occur unexpectedly things get worse.

A solution to the reinforcement problem could be, that jumping into a dilated system would incur a prolonged session change.

E.g. at 10% TiDi 5 real minutes amount to 30 seconds time of actual gameplay. Anyone trying to jump into such a system would need to wait 4:30 minutes (ship not in space) as not to gain an advantage.

Jumping from TiDi into heavier TiDi will have to account for the difference between those systems in a similar fashion. Going from 80% TiDi to a system that is in 50% TiDi since 5 minutes would mean waiting for:
time gone by at 50%: 2:30 minutes
time pilot actually spent at 80%: 1 minute
time "lost" at 80%: 12 seconds
session change when jumping 80%->50%: 5min-2:30min-12sec = 2:18min

Anonymous said...

for starters, to say TiDi is bad is objectively wrong. The solution itself is an ingenious way of dealing with a rather tricky problem. The biggest issue isn't that servers are expensive (in the scheme of things they aren't). The issue is that the server is single threaded. So the server may have 8 CPU cores, but per system only 1 is being used max. There are cases where multiple systems will be supported by a thread but at best only 1/8th of the available processing capability on a given server can be used for spaceships.

TiDi is an attempt (and a very good one at that) to make the system "fairer". Prior to TiDi (and this may actually predate your time in eve), a big fleet battle resulted in random disconnects, black loading screens on jumps, unbareably long times between actions occuring. Actions never occuring (imagine clicking your DCU, seeing it start to spin and nothing actually happens). TiDi is a strategy to make sure everyone on the node gets a fair chance at having their actions attended to, even if the attention of those actions takes a very long amount of real time to get processed it will still be run in sequence.

of course this is far from perfect, but neither is your solution. Disconnecting people because they are in small ships hurts new players more than anyone else, and provides a massive advantage to alliances who have the skill advantage but not the numbers advantage. All they need to do is blob the server hard enough until their enemies frigates and cruisers log off...

the "real" solution to TiDi allowing escalation from afar is several fold, but a good start would be to use traffic control to prevent people entering systems which are under heavy TiDi (say, at 50% of TiDi). Traffic control already exists for overloaded systems. Expanding it to prevent people entering any system whos node is under stress would mean that long range support exploiting the downsides of TiDi would be minimized.

Gevlon said...

Preventing people from entering a loaded system means that the defenders are lost before the battle started, the attackers can fill up the system and finish their objective without defenders present (unless those who are in the targeted system by chance).

If you can blob the system with batteships and above, the enemy cruisers and frigs present or not will make no difference (see Fazor cruiser "help" to Solar). The ability of flying frigs and cruisers in a large battle is fun for the pilot but will have no effect on the battle and the lag/TiDi ruins the fun of others. It's easier to just say "this party is over your head boy"

Anti said...

i read something a while ago about CCPs response to a battle on an unreinforced node. i'm not sure but it might have been the battle of asakai.

they talked of removing other systems off the node. from what I understand a normal node holds several(perhaps many) systems which is fine with normal trafficloads. it is also possible to remove near empty systems off the normal node when one particular system became overloaded.

lets say a normal node can hold 20 systems. or a single battle with 1000 ships. and a reinforced node can hold a battle with 4000 ships. why not put all normal systems on reinforced nodes? assuming linear scalability a reinforced node would hold 80 unloaded systems. when a large battle starts to form in a particular system the other 79 systems could be removed leaving the battle on a reinforced node.

Pheredhel said...

so, here are a lot of points that show some interesting misconception on what causes load.

There has been an interesting video by CCP when TiDi was introduced. About 100 ships forming a circle or sphere and 100%. All slim-clients. Then the developer ordered them to attack ... with missiles . That alone dropped the test-server to 10% . Yes most likely the test-server was not a full-strength server, but it shows how little effect the ships had.
So grouping Drones and the likes won't do too much. There are technologies that would allow to have the servers offer more capacity, but it would be a huge development overhead (distributed computing for servers). But a rather obvious solution would be to create a gradient of TiDi.
Example (numbers up to discussion):
A system goes to 10%, then all surreounding systems go to 20% , all around that to 40% and so on. that would hit more player, but would make travel in there a lot more smoothly. Additionally for jumps/jump bridges, they would have to take an appropriate time penalty or have to be from a close by system.

Another kind of obvious solution would be to simply remove rockets... but I doubt that will happen.

Gevlon said...

@Anti: your idea is good, and probably someone have thought of it and have a reason why it doesn't work that way. For example there is a different load profile of a battle and a normal region managing server. The battle is clearly CPU limited, while managing Jita is probably limited by storage access times. So a server that runs 80 systems smoothly might need lot of memory but weak CPU and can't run a battle.

@Peredhel: not many shot missiles in Asakai. However it's clear that 2x more players create more than 2x more load. So even if the problem is missiles, it can be solved if we kick out the lower half of the players, many of them who were shooting missiles.

Foo said...

Another option for 'add hoc' battles; reserved servers that only handle 'on grid' ship/big battles.

Once tidi hits "X"% (lets say 75%), everyone on grid has a 'session change', while their assets (including missiles/drones etc) are copied to the new server.

Warping to grid also is an opportunity to sneak in a session change to the new server.

Posssibly one or more acceleration gates could be created during this session change, possibly requiring additional pilots wanting to join to even have to fight to get to the main battle grid.

This potentially could cascade, with the acceleration gate(s) grid becoming a second large battle, requiring itself to be put onto another reinforcement server. Even if you miss out on the main fight, you still get 'good' fights.


Also, 'concord claims its too hard' to work out who is shooting who; and the apparently expensive checks to see if any action makes you suspect no longer apply.

The same rational can potentially be given to a reduced set of 'local' players, off grid boosting, and whatever else can be discarded to maximise the number of players on grid.

If 'concord' wanted to hand out sec punishment, bounty rewards etc, after the battle they could go through the kill logs.

From a generic coding perspective, this is possible, but not necessarily easy. It would be an 'expansion' effort (or maybe even more). However, given the player bases love affair with large battles, it may be worth the cost.

OskaRus said...

It is interresting to read discussion abou purely technical problem by people who have no idea whatsoever about CCP server infrastructure or game server application implementation. It have somewhat religios resemblance.

Solving technical issue by modifying user experience is always a bad idea anyway. And the more complicated the solution is the more bugs and anoyed users it will introduce. And this solution is very complicated.

Anonymous said...

"What is better? 1000 customers having fun and good press at the cost of another 1000 kicked out of the system, or no battle at all?"

With TiDi 10%, kicking out half of your users changes to TiDi 20%. That trade off shows no good business sense. To reach your 75% TiDi, you'd have to throw out all but a minority of your users. For a game that advertises "one universe, one shard", I doubt that will get good press, no matter the battle.

Aside from that, what makes you think your policy would actually change TiDi? I expect people to reship and medium-term to change fleet doctrines accordingly.

If you don't bring the "right" size ships from the start, you make yourself vulnerable. Your enemy can force your "wrong" ships out: Oh, hey, let's jump in all our 2 day noobs and alts in battleships. That's one of several points where your proposal is open to meta gaming. And as someone else already cited CCP, bystanders don't use up much load.

Oh, and from a software engineering perspective, it's a bad idea to create a special case (e.g. bot grouping) in your code, that doesn't get exposed to users except when the shit already hits the fan. Imagine a bug that can crash a server only when you have a huge battle going. Indeed, good press. In contrast, TiDi can be implemented in a way that almost all of its code is always in use, by making normal play same as TiDi 100%.

Von Keigai said...

Let me second OskaRus: unless you know a lot more about some highly technical aspects of the server code, it is hard to say very much about how to improve the handling of large battles.

A second point here is that tidi is a good system in that it meets the one absolute must-have criterion: players cannot easily game it. Any system in which any players are being kicked from a server is gameable.

Druur Monakh said...

You don't want to hear this, but I say it anyway: the players do have a right to be present in big fights, for one simple reason: they paid for it.

CCP sells EVE with the promise of massive fleet battle where everyone can participate - if they now did an about-turn and divided the players into worthy and un-worthy, they better be prepared for many angry refund requests.

In addition, preferring one group of players (and especially the old boys club of capital pilots, or in WoW terms: the level-capped endgame raider) would go against the principle of emergent gameplay CCP is espousing.

And there's a technical problem as well: TiDi doesn't affect just the system the battle is in, but also all the other systems on the same node. Players would be rightfully pissed if they couldn't undock because of a battle they aren't even trying to participate in.

Gevlon said...

@Druur: "the they all paid for it" is nice to say, but not an option, since the server can't support them. Now blind luck decides who gets DC or blackscreen. At least if you are kicked out of the server, you aren't losing assets.

About "old boys": you seem to ignore that they control both the outcome and the existence of the fight. You must cater to them if you want the event to happen at all. Asakai did the very opposite of what you want: the Goon FC lost capital FC rights and Goons were banned from capitals on non-reinforced nodes. The "big boys" are doing their best to make sure Asakai won't happen again.

@Von Keigai: TiDi is highly gamed already. PL often flood besieged systems with TESTies to force TiDi, to slow the battle down, gaining them time for the supers to arrive.

@Anonymous: the server load vs player count isn't linear, rather quadratic.

Yes, you can game the kickout mechanic by shipping up. But that increase the stakes. You can lose the 1 week old battleship newbies you know and it hurts more than losing rifters.

Anonymous said...

"@Anonymous: the server load vs player count isn't linear, rather quadratic."

How comes?

In an EVE battle most ships interact only with a handful of other ships (that's linear). Except for smartbombs and bumping I don't see anything requiring N target checks, and those can be implemented with logarithmic complexity or better because they depend on range.

So this leaves communications, which surely is quadradic (N actions to N clients), but today's CPU's don't break a sweat before saturating the network card. If network throughput would be the problem, reinforcing systems wouldn't help the way it does. Aside from that CCP themselves have stated this to be a CPU issue.

And not only logic, but also evidence (e.g. CCP's TiDi video) indicates that the behavior is not quadratic. TiDi heavily kicking in just because ships open fire strongly suggest that the linear part overshadows any other.

So any proof to the contrary?

TL;DR: Please stop arguing about algorithmic complexity if you haven't the background to analyse it yourself.

Druur Monakh said...

Luckily for us CCP Veritas didn't take "this is not an option" for an answer.

Yes, hardware servers have limits, no matter how clever the software, and these limits will always be gamed. But of all the possible software approaches, TiDi (which btw is not mean as final solution to the hardware limits, but as last-resort method of graceful degradation) has the advantage of being impartial to the events. If you say that pilots of small ships should suck it up if they can't participate in a big fight, then I can say that pilots of large ships have to suck it up if they get thrown into TiDi.

One example: one of the more eponymous fights of the Agony Basic PvP roams was a carrier take-down by a frigate mob. Under your suggestion, this fight might not have happened at all. And while the fight didn't shape the landscape of EVE, it did shape the attitudes of the participants - a far more important resource.

Heck, Goonswarm may not have happened under your premise.

Bottom line: any attempt by CCP to solve problems by excluding classes of players, by sorting players into important and unimportant, will break the very core premise of the game.

And as just recently the Luminaire live event showed, players will take lag and TiDi over outright exclusion any day. TiDi is the price we gladly pay for egalitarian gameplay.

Gevlon said...

We don't know how CCP algorithms operate, but we can say for sure that the more people, the more server load. I'm not questioning that other moves (code optimization, server upgrades, multi-core operations) can work. However there is no server and no code that can support infinite amount of clients. See Diablo III and Sim City start issues, despite they were supported by much richer corporations than CCP.

If we accept the fact that we can't support more than N customers without service degradation, we can only choose 3 options:
* degrade the service to everyone (TiDi)
* remove random customers (random DCs)
* remove customers based on some profile.

I vote for the third and within the third I'd vote to keep those customers who create more content to other customers.

Gevlon said...

@Druur Monakh: the cardinal problem with TiDi is its self-harming nature, the slower the fight locally, the more people can arrive from outside, slowing it even more.

It's a battle-skewing problem too, as it clearly prefers the stronger entity, removing the chance of an underdog to perform a surprise attack and disappear. You can't hit and run if a 5 minutes long battle last 50 minutes. Due to TiDi, Pandemic Legion cannot lose a serious capital engagement, no matter what mistakes they make or how skilled their enemies are.

Also, while you (or even majority) of the players prefer an egalitarian approach, it hurts those who create the event itself. It's not a theoretical problem, the Goons opted out from any capital fights without a reinforced node, in an attempt to prevent Asakai happening again.

Luminaire is an exception because the event was made by devs, not players.

The core of EVE is not its egality but its player-driven nature. If players choose not to drive it, it dies.

Anonymous said...

I think the approach you are proposing would have a lot of unintended side effects, and be problematic. This coming from someone who has worked on game engines at massive scale (billions of requests per day).

The core problem is twofold.

- Bottlenecks that result from having to process certain things in sequence.

- The amount of information that has to be sent to the client with large numbers of game objects interacting in a small area

The solution to the first problem is to make the system more distributed. Easier said then done and almost always requires major changes in architecture. This is something I am sure they are constantly working on, and if they could, would do faster. It's just a huge amount of work, and requires changes at all levels of the code usually.

The second problem also has a straight forward solution, which is to increasingly limit information sent to the pilot based on proximity. dscan range would start to shrink, ship and pilot information would only be available for other pilots within a small range, etc .. It gets a bit complicated with large battles because you have to calculate all the pilots and what information they need to know. You might be in a fight with pilot A who is also fighting pilot B. B is outside your range but inside A's range. What information do you get? Solvable but complicated.

The main benefit of tidi is to keep systems from completely overloading and causing error conditions. And it only scales so far. You can only slow down the game so much, before it's just as bad as getting an error and completely disconnecting (from a user point of view).

On the bright side. Eve handles this problem better then any other game out there by a long shot. And if they keep a stable or growing subscriber base, I think they just might invest even more into this. Normally, most games would not even consider major architecture changes at this stage, but with eve you never know. They might have even taken advantage of DUST to play around with some new ideas.

Tego said...

I think the arguments against the idea miss what should be the central point. massive TiDi is an exploitable mechanic that removes what should be valid gameplay (hit and run)

Who cares if TiDi reaches whatever level entering the system should be as delayed as the TiDi in the system. because of mechanics the only way to achieve a system where the closer you were the faster you would get there, which is logical, rather than whoever gets into que at final gate / cyano first gets there first is gradual TiDi around the event in neighboring systems.

Anonymous said...

I'm just glad ccp picked the simple TiDi as a quick fix nooblin here never saw what a 1500 man fight looked like under soul crushing lag.......you have a unique solution goblin but id have no faith in CCP being able to implement it. The more complex it is the more likely the goons will find something to exploit in it. The real solution to the lag lies in the n+1 problem, if u can solve that you win.

Anonymous said...

"If we accept the fact that we can't support more than N customers without service degradation, we can only choose 3 options:"

Actually, there are more options:
* If N is large enough for the current customer base, it's a luxury problem, so you can't simply dismiss methods to increase N.
* Or the inverse: N is not the number of users, but the numbers of user on one server. Split up what's the smallest unit run on a server.

"* remove customers based on some profile.

I vote for the third and within the third I'd vote to keep those customers who create more content to other customers."

There is still the strong possibility that people will just adjust to the profile and nothing changes.

Behnid Arcani said...

It's a good start, but I'd rather set up the situation pro-actively rather than reactively.

Essentially, I would like to strategize null sec. Do as you say, but have shifting environmental effects in systems controlling it, rather than just kick people when a big battle starts.

Have rolling black outs on certain gates in null. It's dangerous for repairmen to get out there, so of course they'd break down once in a while. Have some gates only have enough power to jump frigates. Let cosmic storms prevent anything but the toughest ships from travelling through.

That way, you could predict more accurately where the big battles will occur, and shift server load accordingly. And if something does happen unexpectedly, you already have the mechanic by which to reduce players quickly. Just conjure up a cosmic storm to sweep away the weaker ships.

So essentially your idea, but with a flashy light show so players don't feel too cheated.

Anonymous said...

TiDi and your solution do not address the problem.

The problem is the server app is not multi-threaded.

I don't have a source for this however, one of the top execs at CCP said "Heh, multi core computing is just a fad". Oh how wrong he was. Over the years they have split everything in to smaller segments (market, chat, etc, etc).

The solution is to re write the server side app. With dynamic allocation aswell if they feel like it.

Anonymous said...

I don't have a source for this however, one of the top execs at CCP said "Heh, multi core computing is just a fad". Oh how wrong he was. Over the years they have split everything in to smaller segments (market, chat, etc, etc).

This is actually a limitation of the technology the eve sever is built on - not a function of a comment by a top CCP exec.

One of the things that should help is "brain in a box" - much like client side dogma fixed the issue where each jump the server had to recalculate your ship stuff, brain in a box will hive off the job of figuring out your skills to a seperate machine - meaning the "jump lag" (or rather, jump tidi hit) will be aleviated.

The problem is though, that if you make servers able to deal with more and more pilots, all you'll get is more and more pilots. which is cool and all, but you'll never solve the problem (even if you fixed the server to run multi threaded)

Maxim Preobrazhenskiy said...

Another way is to limit the amount of people who can be in a system at any given time to something a 75% TiDi can manage, but allow some manner of livestreamting from course of action (f/ex some virtual newsreport station, streaming news of battle to other stations).

Anonymous said...

I would have a completely different approach to this problem: why are there 1000 man fights and are they any good at all?

A solution to reduce TiDi is to reduce the size of blobs. 10 grids with 100 man fights are a lot more easy to handle then a single grid with 1000 ships.

This can be achieved by introducing artillery: off grid ships that can project damage to other fights in the same system. This would work similarly to artillery and air support in RL battles. A frigate calls out coordinates, or uses a special target painter module to mark the target, and the offgrid artillery can bombard the area to pieces. Maybe even by applying area of effect damage.

This would greatly reduce blobbing, and instead create a more tactical game, where one group is attacking the station, one is providing artillery support, a third one is chasing enemy artillery, and so on.

This would be a solution which needs no server side changes at all, and would still make large battles more easy to handle, and even more realistic.

I mean, in a sci-fi spaceship game with warp drives and jumpgates, why can't ships hit anything more than 250km away?

Wolf said...

CCP is dealing with user loads on a single server seen in very few places around the world. The fact that they have gotten this far is simply staggering to say the least and we should tip our hats to the developers for bringing us this far. While it would be nice to make large impromptu fights more feasible, your solution seems to ignore a number of gameplay balance issues and technical limitations and introduces an avenue for new exploits.

Currently your solution seems to simply encourage reshipping and removing ewar ships from the field. Force people to fly more expensive doctrines, which favors larger power blocks, least FC's find half of their fleet suddenly missing from the field. If you prevent people who weren't involved from the start of the fight from joining, you prevent unaffiliated allies/friends/mercs from joining in to tip the balance for one side or the other and again favor larger power blocks. By the same token, if this was already in place, Asakai would never have happened in the first place since most people involved would have been barred from entering the system. Likewise, doing a live migration of converting items to stacks, automatically disconnecting/reconnecting people, and etc. proposes a huge technical challenge in itself. Large entities would also be able to exploit this system very easily by jumping in a number of ships into a non-reinforced system, forcing it into TiDi and thus disconnecting any unprepared opponents. Which would then give them free reign to do as they like in the system with the suggested mechanics.

I can certainly appreciate wanting to make the game more playable in these situations, but doing so at the expense of player choice is not the way to do it. The hardware will get better and so will the code base. Unfortunately, those two things take time to develop and implement. While the solution in place definitely isn't the perfect one, I have to side with CCP in believing that it's the best available solution for the time being.

Subscribe to the goblinish wisdom