Greedy goblin: Iterative M&S dilemma

Friday, July 30, 2010

Iterative M&S dilemma

The PuG update: 150 members, LK on 15%, and of course permanently recruiting.

The prisoner's dilemma is a is a fundamental problem in game theory. The task can be summarized with the following rules:

The player can choose to "cooperate" or "defect"
If both cooperate, they both get 3 points
If one cooperates, one defects, the defector gets 5 points, the cooperator gets 0
If both defect, they both get 1 point

If there is one game, the strategy is simple: defect, as no matter what the other player does, you are better off defecting (5 vs 3 or 1 vs 0).

The iterated prisoner's dilemma means that the players play the game for several rounds (and remember previous games). In such scenario the players can cooperate for further cooperation. The optimal strategy here is "tit-for-tat". Such player opens with cooperation, then repeat the opponent's move. So if the opponent cooperated, he'll cooperate again, if the opponent defected, he punishes him by defecting next round. A tit-for-tat player gets 3 points every round against another one (or against an altruist) while he gets 1 against a predator (who always defects). The predator never gets more than 1 (except first round), but the tit-for-tats get more when they are playing against other tit-for-tats. So tit-for-tat defeats predators all cases.

Considering that tit-for-tat is the single optimal strategy, never losing against any and providing maximum possible reward against itself, there should be no other strategy. For some strange reason, the basic move of tit-for-tat, revenge is considered one of the worse things by socials. And it's not just a whine to avoid well-deserved punishment! The cultures where revenge is common are usually much poorer than more forgiving communities. How can the optimal strategy turn into a very bad one?!

Our beloved M&S enters the game. He - like all other players - can choose any strategies. But since he is an M&S, he messes up cooperation! He has a suck chance (S). Every move he has S chance to defect, no matter what he wanted to move. He doesn't know that he suck, so if he means to cooperate, he is unaware of the fact that he just defected. Also, since he doesn't know he sucks, he refuses to choose a different strategy than the non-sucking players.

Let's see what happens if an M&S playing tit-for-tat meets another (M&S or not) player playing tit-for-tat. They open with cooperation and cooperate until the M&S messes it up and defect. The other player punishes him by defecting next turn. The M&S (unaware of the fact that he defected first) sees that the opponent defected and punishes him by defecting next turn. Of course the other punishes him back. From this point they keep punishing each other. "An eye for an eye makes the whole world blind."

OK, let's fix this! Enter a new factor: forgiveness (F). The player plays tit-for-tat, but in every round when he wants to punish his opponent, there is an F chance to choose to cooperate instead. To model this strategy, I ran multiple 10000 rounds games and their averaged results are displayed on the chart below:

The lines of the same color belong to the same suck chance. For example the green lines display the situation when the M&S player has 10% chance to fail. The thin green is the result of the M&S playing against a rational. The thick green is the result of the rational player who plays with M&S. The dotted green is the result of an M&S playing with another M&S. (2 rationals always gain 3 points). What can we see:

Increasing forgiveness increases your own result, no matter if you are M&S or rational
When M&S plays with rational, the M&S is always much better off than the rational
The more the M&S suck, the worse the rational ends up regardless of his forgiveness
The difference of gains between the M&S and the rational increases with forgiveness

Nasty, right? Not yet! Here comes the predator. Against a tit-for-tat he used to get 1 point. Against a forgiving tit-for-tat he'll get 1+4F points while the forgiving tit-for-tat himself gets 1-F! The chart below shows the predator spawn line, the point where being the first predator pays. If you are left of the line, the punishment is too big for trying to defect. Above-right the people are too forgiving/stupid and you're better off as predator.

From this point it's a runaway reaction, as the existing predator(s) make the life of the M&S miserable so their strategy is less and less viable. From that point only the arch-enemy of the predator, the unforgiving tit-for-tat can stop it. However I'd like to remind you, that the unforgiving tit-for-tat is always defecting against the M&S as he punishes the failure equally to deliberate defect.

If you think I'm promoting the predator strategy here, you are wrong. I'm promoting unforgiving tit-for-tat. I'm also promoting "non-social sucker" the one who is at least capable of recognizing that he sucks sometimes and pays for the damages he caused by cooperating twice after defecting by suck. When such player is playing with unforgiving tit-for-tat, they both get 3-0.5*S (unless S > 1/3).

Soge mentioned a very good point: among M&S, since admitting mistakes (therefore compensating them) makes "freindly social peep" (who always cooperate) is optimal. 2 such M&S reaches 3-S-S² points. That's why they hate predators so much, but of course can't do anything about them. They also hate "mean" people (tit-for-tat) who are not forgiving "bad luck". So in their pathetic world, being "freindly social" is the best strategy.

26 comments:

Sean said...: You've missed out an important point in your graph. You assumed that it is just a 1v1 game and plotted the graph of represent this. In reality though, it is someone playing multiple 1v1 games with the scores all added up.

With that in mind, the Predator will come ahead in every 1v1 game (because they will always defect), but their overall score will be lower compared to the forgiving strategy.

The wikipedia prisoner's dilemma articulates the 4 characteristics of a successful strategy:
- Nice
- Retaliating
- Forgiving
- Non-envious

P.S. In several of your posts, I mentioned this wikipedia article where I claim that co-operation is the optimal strategy.; 30 July, 2010 07:27
Yaggle said...: Since you are in favor of tit-for-tat, are you in favor of revenge? If somebody cuts me off in traffic, should I try to seek retribution? Because most of the time I would really like to, but I feel it would not be in my self-interest. What about the death penalty? Does tit-for-tat mean that society should execute all murderers? Or does is this just game theory? Because I am very much in favor of the death penalty(I feel it is cheaper to kill them than to imprison them for a lifetime). Interestingly I remember this theory discussed by my professor at a computer science class at the University of Arizona about 20 years ago. It sounds to me like in Wow it is probaby best to be a predator until you can get into a guild with low M&S and then adopt a more altruistic algorithm. The reptilian brain cannot learn and evolve its strategy so it is always a predator but humans can store information and learn which lets them change their behavior from predator to tit-for-tat or a more altruistic one. Thank goodness for that, even if it's a social concept, it's better than watching over your back all the time. The social has thrived by forming groups and casting out those who are different, but sometimes the different one is the predator.; 30 July, 2010 08:20
Anonymous said...: "The cultures where revenge is common are usually much poorer than more forgiving communities."
I can't find much about it in the wikipedia; can someone provide a link with examples?; 30 July, 2010 08:27
Gevlon said...: @Azzur: it's not just predator coming out winner in all 1v1. It's that predator gains more in an average match than any other strategy if forgiveness is high.

A predator wins the first game and D-D 9 times against unforgiving, so he gets 1+4/N pt on average while the unforgiving got 1-1/N. However the unforgiving gets 3pt/game when playing with another unforgiving, so on average he won. That's what you talking about.

However against a 30% fail, 50% forgive player, the predator gets 2.4 pt on average, while the mentioned player gets 2.31 pt on average when playing with another 30/50. And this is just for the first predator, the second gets 1 pt when playing the first and the 30/50 gets 0.65.

Please note that the M&S "strategy" is not optimal itself. That's why it's not a factor in game theory. However M&S is not a choosen strategy but the attribute of the player.

@Yaggle: traffic is a non-repeating game, you'll never meet the same guy again, so it's pointless to punish him. However you should try to get advantage over anyone you can (open with defect); 30 July, 2010 08:32
chewy said...: I disagree with you Gevlon in regard of Yaggle's traffic question. Both you and the other driver could be travelling on the same highway for some considerable time, therefore, it could be a recurring event.

I think the difference with traffic is that the participants control the degree of punishment. The other driver cuts you off, you cut him off closer, he does the same to you and so it progresses.

It would be interesting to see what would happen in your game example if the players controlled the level of punishment, but the punishment reflected on themselves, a crash in the traffic example.; 30 July, 2010 09:03
Andru said...: Traffic is not a prisoner's dillema, it's often a game of chicken.

In a game of chicken, the optimal strategy is to take the minimal loss whenever possible, and not expect the other to do it. (It's more rational to assume that everyone else playing chicken is irrational.)

And on executing prisoners. There have been prisoners wrongfully accused in the past whose innocence was proven by ADN testing 30 years later.

You can release a wrongfully imprisoned man. You can't unexecute a dead man.; 30 July, 2010 09:10
Anonymous said...: you can't account for that; a prisoner's innocence while "storing" him, waiting for a potential alibi to absolve the crime, while the prisoner awaits his death. meanwhile, money is still being spent.

assume a newly-absolved prisoner is released, turns around and sues (understandably). now there's lost cost in both the case of maintaining the detained prisoner, in addition to a settlement.

while it's unfortunate that "innocent until proven guilty" (in many countries, that is -- i know the burden of proof lies with the accused in some) doesn't protect everyone, i agree that execution is the most efficient means.; 30 July, 2010 09:19
Chyronn said...: You've considered tit-for-tat as the optimal strategy, however I think that the optimal strategy is a tit-for-tat that sometimes randomly defects (in an environment with forgiveness)

Basically the players will repeat the opponent's last move, then after some random time where both are cooperating the one playing my strategy will defect. They will then enter an alternating defect cycle until the adversary forgives.
They will now be cooperating again and the cycle repeats...

Regards and keep up this great blog.; 30 July, 2010 11:40
A.G. said...: The part of the prison population that is in death-row is probably (hopefully) disproportional compared to total imprisoned population.
Most imprisoned people are generally drug users. Numerous studies have shown that legalizing drugs would save society alot of money.
I wouldnt say that you should legalize everything, but if you say that death penalty will cost society the least, then you should also stand for legalization of drugs.; 30 July, 2010 11:52
Leeho said...: Well, actually, according to wikipedia:

Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This stops long runs of revenge and counter-revenge, maximizing points.

So forgiving is better in iterative game, even if you are not playing against m&s.; 30 July, 2010 12:22
Gevlon said...: @Olga: I'm not questioning that. Actually "unforgiving" tit-for-tat is forgiving, if the other player cooperates, next round he'll cooperate too.

I'm also not questioning the value of a little forgiveness.

I'm saying that in the presence you have to forgive so many times, that you'll be prone to predators.

The bottom line is that you can't really separate an M&S from a predator, you can never know if he defected deliberately or not. You either let both in or neither.; 30 July, 2010 12:37
The Gnome of Zurich said...: The word "optimal" is getting thrown around a lot in this thread. It has a very specific meaning in game theory: a strategy that does as well as it possibly can given a perfect opponent.

Colloquially, it's used to described the provably best possible strategy against a particular opponent strategy, or mix of expected opponent/league strategies.

Neither of these is what's happening for most of the uses here. If there is an optimal strategy for iterated prisoner's dilemma, it is not known. The research so far shows that tit-for-tat turns out to be an excellent strategy, doing much better than many far more complicated strategies, and for general iterated pd, no better strategy is *known*, but nobody knows what the optimal strategy is.; 30 July, 2010 13:20
Andru said...: @ Anonymous arguing that executing prisoners is cheaper whether or not they are innocent.

Orly. Do remember than when executing an man you're effectively costing the society in opportunity costs. While it's hard to put a number on it, courts have assesed a person's life worth in average expected yearly pay multiplied with life expectancy.

Damned if you do, damned if you don't. Either cost the state in immediate inprisonment costs, or cost the society in opportunity costs. Of course the 'rawr, execute is cheaper' crowd have no sense that opportunity costs exist.

However, this also allows for a brilliantly cheap solution for everyone involved. Slavery for death row. Either choose to submit to execution, or go into slavery for the state as a life sentence.

If you're later found innocent, you're immediately released and paid your work's worth plus the difference in opportunity costs should you have been let free to work. If you're NOT found innocent, the society has lost the currency value of your crime minus the signed difference between your imprisoned work and your theoretical 'free' work.

Easy. Wrongfully accused get to be alive, society benefits from their work, and rightfully accused are not costing the state money anymore.

Nobel for Economics plz.; 30 July, 2010 14:44
Wilson said...: This would all be valuable analysis if we were playing Prisoner's Dilemma, but we aren't, are we? Your rules award a successful defection 5 points, more than can be obtained from cooperative play. What is the equivalent to this reward in WoW raiding? If the boss dies, I get emblems, reputation, the opportunity to progress, emotional satisfaction, and possibly loot or gold or dkp (depending on the RNG and raid set-up). If the boss doesn't die, I lose time and gold. The only possible reward for "defecting" is the one-time savings from not preparing for the fight (research, gemming, etc), but if I valued that more than the rewards of success why would I even bother playing the game? I agree that there are people who sometimes cause a wipe due to incompetence, but you are claiming that we need to guard against active predation. It is clear to me that you are using the results of one set of rules to argue for certain behavior in a game with a completely different set of rules. This is nonsense.

There are aspects of WoW which more closely resemble Prisoner's Dilemma. The situation we had back when you could roll "need" or "greed" on frozen orbs, for example. But the overall raiding process? No.; 30 July, 2010 15:16
Soge said...: You are missing a point in your analysis: Everyone commits mistakes. However, a M&S either has a higher mistake rate, or is unable to recognize his own mistakes. Thus, "mistake compensating TfT" is "optimal" if you and you similarly imperfect opponent adopt it.

Also, you missed a very interesting conclusion: Amongst sucky M&S, having "freindly helpful" behavior (always cooperate) is optimal! Since you are too stupid to admit your own mistakes, you are always compensating for them, and thus cooperate no matter what. While you will be beaten by predators, since most people aren't by definition (you suck as much as your peers), social behavior amongst peers becomes an excellent strategy. Also, since you always lose to predators, you will shun them, despite not having many means to do so.

OBS: optimal as in "best known option", not "absolute best option"; 30 July, 2010 16:09
Michael Young said...: Gevlon, I'd be crazy curious to see your response to the Southampton strategy of collusion, which has beaten tit for tat in iterative prisoner dilemma. Colluding agents had a sequence of defect/cooperates that allowed them to recognize each other, and then one would defer to the other and let them get max points.

For all you rail against the foolish ideas of the M&S, the guilds you've set up, like The PUG, tend to be more free and equal, more a meritocracy, than most progression guilds.

The progression guilds I've been in have had a huge emphasis on hierarchy, and a player is judged not just on his performance, but on how well he fits within the hierarchy. Whether he makes waves among the raiders/sitters/officers, how well he can get along with people, whether he defers to his class lead, etc.

Using the metaphor of iterative prisoner dilemma for in-game social interaction, I'd say that progession guilds, full of people willing to play the hierarchy games and the guild politics, are colluders, the optimal strategy; people like you, who refuse to play the social game, are tit-for-tat, the next most optimal; and everyone else prefer other things more than optimal outcome. :P; 30 July, 2010 16:30
Strum said...: Tit-for-tat has done very well in various tests/competitions, but I don't think it can be considered _the_ single optimal strategy. The non-repeated prisoners' dilemma has a dominant strategy (always defect), but in an infinitely repeated game, there are infinitely many strategies, and I doubt that tit-for-tat has been proven to be superior to all possible strategies.; 30 July, 2010 16:37
Gevlon said...: @Michael Young: I'm sorry, but the Southampton programs are NOT proving the success of a social guild. The average Southampton program did pretty bad. They feeded a few programs to the top, at the cost of sacrificing many.

The WoW equivalent would be a topguild that allows lot of non-raiders in, who farm for the guildbank, so the raiders can only raid.

In the normal WoW guilds, where every member must (or wants to) kill the boss, collusion is not a good strategy, as the underdogs will have hard time reaching the goal (The Southampton underdogs had no other goal than hurt the other programs, ignoring their own points); 30 July, 2010 17:35
Shannon Fowler said...: @Yaggle, to answer your hypothetical, studies have shown that the relief you get from cutting off someone is usually less than the anger you gained from being cut off to begin with (perhaps for the reason Gevlon cited). The studies concluded it wasn't worth it, though I suppose your mileage may vary (hahaha).; 30 July, 2010 17:42
Vesoom said...: There've been a few comments about the fit of this concept and raiding in wow. I'm not certain that I see the fit there.

Is this idea a general concept about M&S, a statement about The PuG, or regarding activities in the AH (undercutting, camping and such)?

This idea seems to fit with a post from a while back where another player wanted to cooperate to keep glyph prices high and instead ended up (probably) being driven out of the market.; 30 July, 2010 19:44
Anonymous said...: Irrelevant to WoW, Gevlon, since even M&S will kick a predator out of their guild. They aren't locked into playing Prisoner's Dilemma with anyone they don't want to.

As such, predators are an uncommon, temporary setback. They do not outnumber the M&S. It follows that the most rational strategy in a society of few predators and mostly M&S is Tit-For-Tat with forgiveness.; 30 July, 2010 20:29
Kristophr said...: Gevlon:

Current best strat for iterated/montecarlo prisoner's dilemma against a random mix of strats is "forgive exactly once, then tit-for-tat".

Think about it. It allows people with stupid/random strats to cool off, but it still weeds out predators.

That start also works fine with straight tit-for-tat prisoners ... they don't get in conflicts with each other.; 30 July, 2010 22:38
Soge said...: @20:29 Anon

The predator and the 100% fail M&S are exactly equal.

In a raid environment, not collaborating while the other do means beating the bosses with minimal investment (in-game or out-game), a high points situation. The difference, however, is that in the average boss with average equips (iLevel245 or so), a guild can carry some morons along. For the sake of argument, consider that in 10 man if 2 Fail, the other 8 can beat the boss. Thus, the ones failing are receiving max points, while the others only average points. If everyone cooperated (again with average equips) they probably would be able to do harder bosses (better return) with the same amount of work. And, if more than 20% slacks, everyone loses and lose points.

Wow is different in that if everyone does not cooperate, then you actually loses points. However, if you do cooperate and drag some M&S along, you won't lose as much as in the given ITP. Also, the benefit of full cooperation is equivalent or slightly lower than that of being the only one defecting. In that situation, the rational answer becomes to cooperate. OF course, cooperation is conditioned to the failure rate, and thus you do return to the solution of compensating Tit-for-tat.; 31 July, 2010 07:18
Anonymous said...: I see an interesting connection to one of your recent entries about people playing for fun, and the book that referenced Scrubs and "Playing Fair". People like that always irritated me, because I place a very high value on playing fair... but in the true meaning of the phrase.

If we were playing a game of soccer, would it be acceptable for me to catch the ball with my hands and throw it into the top corner of the goal? No, because that violates the rules of the game. "Playing fair" involves adhering to those rules, in this case.

I bring this up because I tend to favor the "tit-for-tat" approach to many things, in-game and IRL. I'll treat people honorably, fairly, and with compassion by default... until they prove they won't reciprocate of course. If we're in a random heroic, I won't roll Need on anything I don't actually Need.. but if you do, I'm Needing on everything the game lets me Need on the rest of the run.

Playing fair isn't stupid, but it also doesn't mean "everyone has to do what I say or play how I think"; it means honoring your agreements, and not violating the rules/laws. In-game, that means no hacking, no exploits or use of bugs, and generally no doing things forbidden by the ToS/ToU. Those are the rules we all agreed to play by when we signed up for the game. Getting a group of 10 level 80s to hound people leveling on a PvP server may not be an even fight, but that doesn't mean they aren't "Playing fair".

Even outside of Prisoner's Dilemma, the tit-for-tat strategy works for one simple reason: cooperation gets everyone the best results in most cases, but if someone won't cooperate and looks out only for himself, you'd be a fool not to deal with him on such terms. You can rant and rave about ethics all day, but in the end, you still have to deal with the person; best to do so in a fashion that won't let their behavior destroy or harm you.; 31 July, 2010 14:36
Lamnium said...: Two things that this model seems to leave out is diminishing returns on forgiveness and suck chance.

In a true situation, somebody using tit-for-tat with forgiveness is less likely to forgive a particular person if the person has repeating offenses. This helps to curb those predators that always defect.

The other missing element is that people tend to learn from their mistakes. Most M&S try to learn from their mistakes. This, in a way, reduces their suck chance each time it's called into account.

With these two things in play, over a long time, the amount of predators drastically reduces and an M&S tends to improve and may have a negligible suck chance, in effect becoming another regular.

On the other hand, this only comes into play if all games are remembered. In a situation such as the Random Dungeon Finder, where most times you will never see that person again, then an unforgiving tit-for-tat becomes better for all.; 31 July, 2010 21:08
Alrenous said...: The optimal strategy is actually tit for two tats, to account for noise, such as your own misunderstandings.

It suffers slightly more from predators and can thus benefit from a small population of police-tit-for-tats, but it also punishes failure indiscriminately, as you require.

In case you're unfamiliar with two tats, it doesn't start defecting until the second betrayal...and similarly doesn't start cooperating again until the second act of goodwill. Tit for tat can get stuck in an infinite oscillation in the mirror match, if there's even the tiniest bit of noise.

If there's too much noise for two tats, there' probably just too much noise period.; 08 September, 2010 03:55

Greedy Goblin

Friday, July 30, 2010

Iterative M&S dilemma

26 comments: