TheBoxer's Corner: Why I Believe Ladder Data Collection is Useless
Testing is probably the most important thing you can do for a tournament while also being the thing that is done the least. Most people are scared of collecting a ton of data, but I'm going to talk about how I test, and why I don't think you need a lot of it. First, I’m going to discuss why I believe ladder win rate data are mostly useless. Second, I’m going to explain the Bob Dylan Method, which is how I approach ladder testing. Third, I'm going to go into card evaluation and tuning decks based on testing.
Most people assume collecting data from ladder is one of the best things you can do, but I disagree. I believe there are many factors that make large-scale data collection from ladder useless. The skill-level of opponents is inconsistent. You may be winning a bad matchup simply because you outplayed your opponent. There is unpredictable variation in the lists people play. You may be playing around a card from the better version of a deck and then your opponent slams a Harsh Rule out of nowhere in their Rakano aggro deck. Ladder players often overvalue surprise potential in their lists. Players also will sometimes play hate cards out of anger that mess up your data. When you’re testing Reanimator, you may get turn 1 Gaveled because your opponent is mad they lost to Reanimator five times earlier in the day.
You also have to consider whether your own play is representative. Some decks have steep learning curves. As you play a complicated deck, you will better learn how to pilot it. Your first twenty games will not be giving a realistic representation of how the deck performs but instead a rough measurement of how you perform. In general, your winrate with a deck will go up as you play it more often.
These problems raise a question; is there a systematic way to throw out “bad” games? And if not, is ladder at all useful for testing? The answer I arrived at for the first question is no. Too many factors are in play and most of them are hidden. Even with full recordings from both sides you still could not find a good rationale for throwing out data. The answer to the second question is yes. Ladder is useful for testing, but you have to know what you’re trying to get out of it.
Your goal during ladder testing should be to focus on interactions rather than wins and losses. Every game you should be asking yourself the question Bob Dylan asks in Like a Rolling Stone; “how does it feel? How does it feel?” Forget about winning and losing, focus on whether cards are doing their job. Do your 2 drops line up badly with the removal of a certain deck? Do your 4 cost units die to 1 cost removal? Do you fail to curve out as often as you would like? Does this deck have enough power? These are the kinds of questions you should be asking. And although they sometimes determine the outcome of a game, they can be asked independently of who won. You want to think about how cards matchup against each other or interact with each other. Your combo may cost too much to play together and sometimes you lose because of it but you can still observe this when you are winning.
It’s important to ask what cards should be performing well in a matchup before discounting a card as bad. Defiance is always going to be bad against Unitless Control. However, if Defiance is underperforming against aggro, you may want to consider another similar anti-aggro card like Torch instead. Brainstorm other similar cards that fill that role. If your removal is too expensive then consider cheaper options.
The next question you should be asking is whether that type of card is good in general. Do you want any attachment removal? It doesn’t matter if it’s Furnace Mage or Siege Breaker, you may not be running into enough attachments in general. This is when you should consider cutting some or all copies of this type of card and adding another type that may be more useful. You may not need any 7 or 8 drops in your Argenport Midrange deck.
Cards can’t always be evaluated in isolation. Privilege of Rank and Bulletshaper are a package deal. You could in this instance swap Privilege for Xo, but there are many pairs or triples of cards that don’t do anything on their own. You have to evaluate them as a package. You need to think of what the package is attempting to do and if you cut it try to replace it with cards that accomplish that same purpose. You could change Madness and Combust for Slay and Annihilate but you wouldn’t just change Combust for Slay. Another common package would be Initiate of the Sands and Devotee of the Sands vs Power Stone and Pillar of Progress. You wouldn’t want to do a mix of these cards. One type of ramp will be better than the other.
These two levels present a systematic way of tuning decks. First, compare a card to similar cards. Then, consider different types of cards for that slot. This can also be applied to packages of cards. Cards should be thought of in classes or families. The first question is whether another family member would be a better option. The second is whether another family would be a better use of that space.
Deckbuilding is the art of comparison and replacement. Every removal needs an addition and vice versa. Anything you gain comes from giving something else up. Make changes with purpose. Every change you make should solve a problem. Justify a change by saying why the card you are adding is better than what you are removing. Don’t say the addition is “good” or “busted” and think that justifies anything.
However, sometimes you can’t change around a deck enough for it to feel right. You determine this by realizing that the core is not working. Is Tasbu not pulling its weight in Shadow midrange? Is Icaria not doing work in Valkyries? You may still like other parts of the deck, but if the core isn’t working, you should go back to the drawing board. Maybe you like Defiance but don’t like Valkyries. Then find some midrange or control deck that wants Defiance. Don’t feel like you’re a failure for the core not working. You found out valuable information and crossed an idea off the list. It’s better than the alternative which is to get stuck wasting time rather than moving on to other ideas. Getting stuck on a certain core is one of the biggest pitfalls in testing. You need to let go and move on. Not all deck ideas work. The majority of them don’t. This is the harsh truth of deck building.