A while back, I surveyed 1000 players’ results from Zendikar limited games and tracked how frequently individual commons contributed to wins and losses. (You can find the complete archive here.) Well, I did the same thing for Magic 2010, and I am here today to generalize the results of the past set into lessons for Magic 2011. Is card advantage key? Does removal rule the world? Did the most broken Magic 2010 card make it into Magic 2011? The answers are inside…
If you are unfamiliar with my methodology, I have put it into the appendix at the bottom of this article. You should read it before going through the rest of the article. If you have read it (or just finished), let’s go straight to what is important.
Removal is overrated…
If you think removal is the key to victory in every limited format, think again. Excommunicate went 35-35, or 50%. Pacifism fared only slightly better at 79-78, or 50.3%. Ice Cage was just as average, going 15-15, or 50%. Doom Blade could only muster a 74-69 record, or 51.7%.
That said, Assassinate was the worst offender by far. Its 43-57 mark left it at only 43.0%, which is statistically significant to 95% confidence.
The bottom line: Don’t think that a deck full of removal is going to get you very far. Some removal spells—namely, Assassinate—might even hurt you.
…but burn spells aren’t.
See Consume Spirit (24-14, 63.2%), Earthquake (16-6, 72.7%*), Fireball (45-29, 60.8%), and Lightning Bolt (78-43, 64.5%). The key difference here seems to be dealing damage to players. If you can hit a player, you can topdeck wins from certain defeat on the next turn. You can do that with these spells; you can’t do that with Excommunicate, Pacifism, Ice Cage, Doom Blade, or Assassinate. Possible card advantage doesn’t seem like enough, with the creature-only Pyroclasm only hitting 27-23, or 54.0%.
Corrupt—Magic 2011’s answer to Consume Spirit—should fare better than black’s regular allotment of removal spells, but I doubt it will be able to keep pace with Consume Spirit. Unless you are running an overwhelming number of Swamps as your mana base, Corrupt is going to deal less damage. And if it is dealing less damage, then you will not be able to take out as big of creatures or finish off your opponent at as high of life totals. Nonetheless, I still expect it to perform better than an average spell.
Overrun is ridiculously good—thankfully, it is gone.
Players cast Overrun 41 times in this study and won 35 of those games (most of them right then and there), good for an 85.4% finish. That 85.4% was the highest observed winning percentage among all cards with more than four appearances. The card is undeniably broken. It’s hard to even make a case why you should draft any bomb rare over it. Even something like Baneslayer Angel loses to Doom Blade. For Overrun to lose to a removal spell, your opponent pretty much needs to be able to destroy all of your creatures in response. Anything short of that, and they will probably be dead.
Actually, the only compelling reason to draft Baneslayer Angel over Overrun is purely economic. If you see an Overrun, take it. You are playing green.
On another note, why was this card still an uncommon in Magic 2010? Wizards printed Serra Angel at uncommon for years before moving it to rare, supposedly because it had a rare feel to it. How does Overrun not feel like a rare? The name is epic and the ability is completely game shattering. A card that wins 85% of its games and only costs five mana should not be seen as often as it is at uncommon. Perhaps Wizards finally caught on to this, decided that they could not make it a rare for whatever reason, and then did not print it altogether. Regardless of the reason, however, I am glad to not be seeing Overrun in any Magic 2011 games.
Card advantage is important…
Conventional wisdom says that card advantage is extremely important in Core Set draft formats, as it is (supposedly) harder to outplay your opponent to gain card advantage in the (supposedly) more straightforward Core Set limited formats. While I have no basis of comparison to verify that claim, drawing cards certainly looks good: Divination went 39-14, good for 73.6%. Perhaps more impressive, it was one of only four cards to be statistically significant to 99% confidence. (The other three were Overrun, Warpath Ghoul, and Safe Passage.)
How that applies to Foresee and Preordain remains to be seen. It seems that Divination's power comes mostly from digging deeper into your deck and not so much from recovering from mana screw. Since Foresee can go through up to six cards (15% of your deck!), I'd imagine it will do well. Meanwhile, I can't imagine Preordain hurting a player, but I also don't see it burning up the charts like Divination. So we'll see on that one.
…but it isn’t worth paying your life for.
If drawing two cards for three mana is really good, then surely Sign in Blood must be as well. After all, paying two life is not that big of a deal, especially if it trims one mana off of the casting cost. Not only will that save you some tempo, but the ability to play Sign in Blood while stuck on two lands is a luxury that Divination will not afford you. Oh, and you also get finish your opponent off when they are sitting at one or two life. That is what separated Lightning Bolt from Doom Blade, right?
That all sounds good in theory, but it ends up not being true in practice. Sign in Blood finished at 36-36, or 50%. Apparently drawing cards is important but not worth paying your life for. There seems to be a couple of things snagging Sign. First, the difference between two and three mana does not matter all that much. You rarely want to play Sign in Blood on the second turn or Divination on the third—you mostly cast these spells after you have emptied your hand of threats. This does not happen until the sixth or seventh turn or so, at which point there is very little difference between tapping two lands and three. Sign in Blood is better at fixing mana screw, yes, but you will not use it for that purpose all that often.
Finally, while you can Sign in Blood to finish off a vulnerable opponent, you will almost never encounter such a situation. If you didn’t cast Sign in Blood on turn two because you had nothing else to do, and you didn’t cast it on turn six or seven because you had not drawn it yet, then you will play it as soon as you top deck it. You just do not see players holding Sign in Blood for five or six turns waiting to target their opponent. In fact, you are almost always better served by throwing it into your graveyard to get two fresh cards off the top of your library. So if Sign in Blood is going to be a finisher, it will be that because you just drew it and your opponent was at two life or less. Otherwise, it is just drawing cards, and Divination is a lot better.
Safe Passage is a trap—for you.
The general consensus is that Fog effects are bad in limited. Indeed, there is usually exactly one good use for them—when your opponent commits to an all-in attack that you cannot survive, you Fog, and you can counterstrike for the win on your next turn. Anything short of that either leaves you facing impending death, down a card, or both.
Safe Passage seemed to be an exception. Here, you could also counter important spells like Lightning Bolt, Pyroclasm, or Earthquake. You could also trap your opponent into attacking with a bunch of creatures, blocking them fiendishly, playing Safe Passage, and watching most the opposing army die while all of yours survive. Failing that, you could actually attack into your opponent, have them make some decent blocks in their favor, and then turn the tables with a Safe Passage. And, of course, it did everything that Fog could do.
Luis Scott-Vargas even said the following about Safe Passage:
Safe Passage is insane, and I didn’t give it the respect it deserved initially. It destroys Fireball, Earthquake, and Overrun, and is at the worst a solid 1 for 1, although usually better. I would play 2 or 3 in most decks without thinking about it, and would pick it over most commons that don’t fly or Pacify.
After all of that, how did Safe Passage do? It finished at 21-42, or 33.3%***.
33.3%***.
A list of white cards that performed better (though not necessarily at a statistically significant level):
Armored Ascension (57.4%)
Baneslayer Angel (68.2%)
Blinding Mage (54.3%)
Captain of the Watch (60.7%)
Divine Verdict (54.8%)
Elite Vanguard (41.3%)
Excommunicate (50.0%)
Griffin Sentinel (51.3%)
Guardian Seraph (46.2%)
Harm’s Way (47.5%)
Lightweilder Paladin (55.0%)
Pacifism (50.3%)
Palace Guard (45.6%)
Razorfoot Griffin (49.0%)
Rhox Pikemaster (49.2%)
Serra Angel (53.1%)
Siege Mastodon (57.1%)
Silvercoat Lion (49.4%)
Solemn Offering (68.0%)
Soul Warden (50.0%)
Stormfront Pegasus (47.6%)
Undead Slayer (53.6%)
Veteran Armorsmith (51.9%)
Veteran Swordsmith (44.6%)
White Knight (56.9%)
So what happened here? Well, my intuition is that keeping three mana untapped at all times does not really help you win games. If you look over the list of white spells above, you will see that a vast majority of them are creatures. So if you are playing white, you are playing creatures; you are not drawing a card and passing the turn. Yet that is exactly what Safe Passage requires you to do. Yes, it could be potentially game breaking against an Earthquake or Overrun (or for Magic 2011 purposes, Destructive Force). Yes, it will do some wacky things if you ever get into complex board positions. Yes, it will buy you an extra turn and an extra top deck when you are on the verge of death. But will it help you win games more than it will hurt you? It certainly does not look like it.
That's all for now. I'm thinking about doing a running track of Magic 2011 cards, so maybe you will hear from me soon.
William Spaniel
Appendix: Methodology
I assume that skill, luck, and the quality of a player's deck determine who wins any particular confrontation. While undoubtedly skill matters, this study is focused on the luck and card quality factors. Players actually have a great deal of control over both of these, as a poorly-constructed deck will win less often than a well-constructed one. From this, we can conclude that some cards contribute to wins more frequently than others. If an average card ever reached play in a game, we would expect its controller to have only won that game around 50% of the time. But if a truly exceptional card reached play, we would expect its controller to have won upwards of 70% of the time.
Watching replays of Magic 2010 premiere sealed tournaments, I recorded the results of more than a thousand players. Every time a card hit play, I would record it as either a win or a loss, depending on what ultimately happened in that game. If the card reached play multiple times (perhaps because of a recursion effect, like Disentomb), it only counted once. But if a player cast multiples of a single card, I counted that card multiple times.
Such a large number of observations were necessary to remove the play skill bias that would have shown up in a smaller-n study. It also shrinks the margins of error, allowing for better hypothesis testing, which I ran at 90%, 95%, and 99% confidence.
For those of you unfamiliar with hypothesis testing, I go over the intuition behind it in this video. But here is a brief explanation of what everything means in words:
90% Confidence: When I say we can be 90% confident that a card positively contributes to victories, it means that there was only a 10% chance that the card has no impact and the data came back so eschewed based on pure luck. While the odds of being wrong here are only 1/10, we should be very skeptical of these results as statisticians. Generally, it is only a good idea to accept these results if we have a good theory behind them. For example, I would accept Lightning Bolt—a quality removal spell as being true—but I would cast doubt on whether Drudge Skeletons was actually affecting things.
95% Confidence: This is the gold standard of statistics. When a card meets 95% confidence, the likelihood the card is merely average but we got this extreme of data back is only 1/20. At this point, it is a good idea to start thinking of theories to justify the results if you do not have one already.
99% Confidence: While rare, a 99% confidence virtually guarantees that a given card has an impact on the match—there is only a 1/100 chance that this result is wrong. You should pay careful attention to these.
Just because a card does not show up a significant does not mean you should not care about the results. But you should not treat them as gospel, either. The best analogy I can draw is to that of a baseball team. It is possible that your star hitter goes through a minor slump at the beginning of the season and an average player goes on a torrid streak at the same time. That does not mean the average player is better than the star; it just means he was better during that period of observation. So don’t be surprised if the study ranks an average card lower than one you perceive as a top-pick.
Additionally, just because a card is sub-50% does not mean you should automatically stop playing it in all of your decks. Going back to the baseball analogy, a team cannot field nine players with batting averages all over .300. But it can maximize its performance by putting its best players in the lineup. So if you need to play a terrible card to have enough white cards to justify running Baneslayer Angel, go right ahead.
17 Comments
William,
First of all I like what you are trying to do here. I think it is valuable information. But I have one complaint with your approach.
You are only considering cards that are played. Let us take an extreme example. Let us say that I have a card that costs 30 red mana and it says: you win the game. By your methodology this would be a 100% card. It would only be played to win the game, but I think that we would both agree that such a card would be a bad card. Most people that put it in their deck would never play it and having a dead card is bad, in general.
Let us think about Overrun. Now obviously Overrun was a great card in M2010. But your method overrates its value. Why? I played with Overrun a bunch in M2010 and there were times that I would not play it even if I could. Why? It sometimes didn't win me the game. And especially in game 1, I don't want my opponent to know that I have it. So when Overrun is in my deck, I play it to win, but I won't play it (and so you don't see it) when I won't win or the game is lost. It will also get played less when you are losing.
Now let us consider safe passage. I don't believe that this is a bomb, but your method understates its value. Safe passage is always played when you are losing. It can act is a fog. So if you are already losing it might not save you. But when I am winning it often sits in my hand! Do I think it is a great card no, but it is not the dog your method makes it out to be.
Another card that would be understated in your method would be Wrath of God. If you start off great in a limited to game and draw your wrath latter there is often little incentive to play it. Your method might still give a high ratting to Wrath, but it will understate its usefulness.
Obviously the best method would be to know the actual decklists. I know that this is not possible, so let us move on.
A better approach is to consider all games a player plays in a tournament when they play a card once over the entire tournament. Is this perfect? No. But it is better than your method. Although it is harder.
In summary, I like what you are trying to do, but I feel your method overstates cards that help when winning and understates cards that help when losing. You can think of this is a systematic error.
Cheers,
Youper
I certainly agree that there are problems, but it really is the only feasible method to acquire data. I wish I could do better, but the reality is I can't.
That aside, I think you interpret this effect on Safe Passage incorrectly. If Safe Passage is sitting in your hand not being cast, then it is not contributing to your victory. Thus, you would be better served with another card. And when it is being cast, you are only winning a third of your games. It is hard to imagine a card with worse qualities than that.
"If Safe Passage is sitting in your hand not being cast, then it is not contributing to your victory."
I disagree with that statement. Generally I don't like discussing Statistics because they devolve into very arcane arguments so I will keep this as brief as possible. You hold a safe passage waiting for your op to play and they knowing you have it because they don't suck do not make the optimal play (assuming no fog or protection effects) they want to. Obviously, if they don't read you as having the safe passage, it should come up that you can play it for the win, so they do read it and you hold it for the whole game. That is most definitely an influence and a contribution. Whether it wins or not may be situational but the same can be said for pretty much any other card you are playing with. Deterrents are perhaps of questionable value in some circumstances but in many they can keep you alive without ever costing you anything.
You could get this same effect by NOT running Safe Passage as well.
Ah but then you lose the possibility of actually using it to great affect. It is the stick. Sometimes you have to show you have the stick to keep the muggers in line.
I can appreciate the complexity of the situation (it has the potential to be a semi-separating equilibrium, which is a bitch), but I still disagree. When you stick it, you are still only winning a third of the time. Given that you are only going to be playing two or three games during a match, I don't think it is worth sacrificing card quality to try to bait your opponent into playing around a card which only hurts you when it gets cast anyway. It seems you would be better served by deviously "showing" it in your deck even if it isn't there.
Wait...I missed the part where it actively hurts you to cast it. To me it seems like a devastating blow if it fires off unless you waste it to stop damage that wouldn't be disadvantageous to take. If you are taking that 33% from actual losses from games where it was cast I am going to start point fingers at otherwise bad decks and probably bad players. And I say that as a NONgood player.
i actually made an account just so i could comment on this article right here and say that i completely approve of your methodology for applying statistics to the game of Magic. luck, chance, and math are all very interesting to me, but they don't seem to play well with one another. also too, in this case, you're attempting to complete a function with one arm tied behind your back. you only know as much as displayed, and thus you can't produce a full analysis, but this gives a very good estimate.
to be honest, you're the only person i've seen that's even attempted to take this approach to this game, and for that, i applaud you.
I've seen a few other writers attempt it but Will surely does a fine job and is the most thorough and conscientious.
The only way Consume Spirit can do more damage than Corrupt is if you have two non-swamp black mana sources (unless some of your swamps are destroyed in response to Corrupt, but that can't happen in M11). Considering that the only non-swamp black mana sources in the format are rare and there are only two of them, I would guess that this will happen less than 1% of the time.
I also disagree with your classification of Excommunicate as removal. Its effect on the game is much more complex and most of the time, much weaker than Doom Blade or Pacifism.
I really like what you have done here, but I have to agree with Youper. We have a lot of information here but I don't think that it is enough to make a judgement about some of these cards. Maybe Sign in Blood performed poorly compared to Divination because black decks are generally weaker than blue ones. Based on what is here I just don't know (maybe you could link to the full stat sheet from M10).
A full stat sheet will take a bit of work (maybe early next week). However, for now, players playing Islands won 49.3% of their games while Swamps won 49.7%. So it isn't the color that's causing the difference.
I find the same thing when watching replays. Player plays Putrid Leech. Opponent drops an untapped mountain. You will NEVER see that player pump that leech for fear of lightning bolt. Do they have it? Who knows, but that mountain makes them NOT pump it.
How did the Crystal ball fare compared to other scry spells and card draw?
Scry in general feels strong in limited and C-Ball looks like 'Divining Top junior' due its filtering ability.
I like reading these articles, purely for another point of view, but in terms of usefulness I find them underwhelming. There is a simple reason for this, and if you've ever read a set review you'll have heard this one before - 'you can't judge cards in a vacuum'.
Take Pacifism, which reached 50.3%. If you stop a fatty with it, then the opponent casts Solemn Offering (a quite unbelievable 68%, and a card that cannot even be cast without a target) on your Pacifism, your data would move towards the idea that Solemn Offering is a better card than Pacifism, which it has. If you cast it because you are desperate to stop something, but end up losing anyway, does that make it a lesser card? What if you lay down a turn 1 on the play Elite Vanguard (41.3%) and use it to clear out their first blocker which doesn't arrive until their turn 3, your turn 4? Speaking of the Vanguard, how many times was it an early play? How many times was it played in a topdeck situation? What other cards are there in the deck that support it?
'However, for now, players playing Islands won 49.3% of their games while Swamps won 49.7%. So it isn't the color that's causing the difference.'
How many games were won when a player played both swamps and islands? Were the win percentages increased for these lands when paired with mountains? Since you NEVER cannot play a land that you have in hand, what happens to the data if a player gets colour screwed? What if they keep a risky 1 or 2 land hand, and fail to draw more?
On a side note, what happens if both players play Pacifism? Does that nudge the average closer to 50%?
If you cast it because you are desperate to stop something, but end up losing anyway, does that make it a lesser card?
>>Yes. But lesser than what is a more important question.
What if you lay down a turn 1 on the play Elite Vanguard (41.3%) and use it to clear out their first blocker which doesn't arrive until their turn 3, your turn 4? Speaking of the Vanguard, how many times was it an early play? How many times was it played in a topdeck situation? What other cards are there in the deck that support it?
>>I can't answer those questions directly, but this methodology allows you to see how useful a 2/1 for one mana is. Sure, it is great on turn one. But how often does that happen in practice? I don't know. Let the data figure that out for you.
How many games were won when a player played both swamps and islands? Were the win percentages increased for these lands when paired with mountains? Since you NEVER cannot play a land that you have in hand, what happens to the data if a player gets colour screwed? What if they keep a risky 1 or 2 land hand, and fail to draw more?
>>A lot of questions that I cannot answer. But, then again, everyone else is clueless about this too. It would be nice to collect that type of data, but that is expensive to do--more expensive than anyone is willing to pay for at the moment.
I like all the work you have done, but I disagree with some of the conclusions being drawn from the data. Removal tests a players skill. Weaker players will misuse their removal, wasting it and giving the match away. Your data does however show something good magic players have known for some time, and that is removal that can go to the dome is preferable to removal that only kills critters. Removal that allows you to also choose new targets every turn is not only more versatile but also more forgiving than one shot effects(pacifism as opposed to blinding mage.) The blinding mage is obviously the better spell to M10 vets, but it is also easier to use than pacifism. Harms Way was infamously difficult to use(largely due in part to the limitations of the software), and I would expect it to have a poor showing despite the fact that it was very swingy in a strong players hands. I think the takeaway here is that a card that is difficult to use is going to rate lower stastically, even if it is a strong card. None-the-less I do find your data valuable in evaluating removal, at least in regards to comparing it to other removal.
There is also a significant difference between sealed deck and draft, and I could be wrong but unless I was mistaken LSV was reffering to Safe Passage's strength in draft. M10 draft left the battlefield clogged, and matches would end up coming down to who hit their bomb first. Safe Passage helped to end stale mates by turning unprofitable attacks into profitable ones, as well as neutering other opponents bombiest cards. It wasn't the sort of card that would turn a bad deck into a good one, but it gave good decks an edge against other good decks in a very common situation that would arrive during play. It could have ended up just sitting in your hand when you are already losing if you failed to establish and maintain some board presence, but in a draft set that played like M10 did it was worth that risk.
I wouldn't share my criticism if your work wasn't worth discussing, and its my personal opinion that your articles are a very strong addition to the site. I look forward to seeing the data on M11. Thank you for your hard work.
I'm late to chime in, mainly because I feel I said mostly what I had to say about the usefulness of this data back when you did it for Zendikar, but I have to speak up. You are doing more harm than good by banging a "removal is overrated" drum as a conclusion from this data. The latest MTGO newsletter provided the top 10 first picks from 8-4 M11 draft winners:
1. Doom Blade
2. Mind Control
3. Fireball
4. Lightning Bolt
5. Pacifism
6. Serra Angel
7. Air Servant
8. Blinding Mage
9. Obstinate Baloth
10. Conundrum Sphinx
While the bolt and ball fall into your "removal that can go to the face is not overrated" theory, recognize that more people have won 8-4s by first picking Doom Blade than any other card, and you are here telling people that removal is overrated, and that a deck full of it won't get you very far.
I think the reason that removal doesn't score huge % win numbers in your system is that everyone runs all the on-color removal they have, and casts all of it that they draw, win or lose. The player who resolves four removal spells is probably going to beat the player who resolved three, but then those three are going to show up in your loss column, making the 4-3 win-loss rate for removal look marginal. You seem to be concluding, though, that removal is overrated because hey, you don't win some large percentage of games in which you resolve a removal spell.
As I said in the last series, the data is interesting, but interpreting it in an actionable way when it comes to drafting or building a sealed pool is hazy at best, and calling removal "overrated" in Limited is dangerously incorrect advice to be doling out on a Magic strategy website where readers are looking to improve their Limited games. It's the wrong conclusion to be drawing from your data, and Doom Blade sitting in the number one slot for M11 Draft victory reinforces that. If one of your readers passes up a Doom Blade (51.7%) for a Siege Mastodon (57.1%) because of your conclusions here, you've done your community a great disservice.
Wizards has done this top ten for Zendikar and Rise drafting as well, and the top ten is *consistently* packed with the best removal and the most powerful uncommon creatures in the format. Removal is NOT overrated, and it IS a key to victory in every limited format.