The Difficulty of Balance
Try not to fall down
Balance is hard, and important.
When design spaces and problems grow, it can sure be daunting to maintain balance amidst chaos.
Scientific Modelling, Problem Solving, Optimization
October 7th, 2013
Literally, a game without balance breaks down, falling apart. The same could be said about your life, a company, your body, etc. Games are sequences of decisions, exercises in willpower, each choice carrying an irreversible fate. We can represent the whole spectrum of choices as a game tree, or a single play with a chain of decisions.
Take for example, Tic-Tac-Toe. This is a game that has been solved, that means its game tree is fully known. Every decision has been mapped, there is a perfect winning strategy for the player that starts. A perfect play by the other player will only result in a draw, never a victory.
If we measure balance by saying that the possible amount of winning endgames for each player is the same, then Tic-Tac-Toe is not balanced. Starting first, a player has 51.4% of all possible endgames to his favor. The player who makes the second move has 30.5% of all endgames, and about 18% of all endgames result in a draw.
If we measure balance by saying that two players playing perfectly will always draw, then Tic-Tac-Toe is balanced. Actually, this is the same as the previous definition, and it highlights the difference. Two perfect players have the same chances of winning Tic-Tac-Toe, 0%.
It seems balance does not depend only on the structure of the game, but on the players themselves.
Simple Games, Complex Games
At a glance, the number of different moves in Tic-Tac-Toe is 9! (9 * 8 * 7 …), that would mean there are 362.880 possible moves. However, this naïve assumption includes illegal moves. Also, the game could end sooner than later, in 5 moves at least.
Counting all ending boards, gives us 255.168. Naïvely noting that the game board has 8-fold symmetry, the number is reduced to 31.896 endgames. Exhaustively accounting for reflections and symmetries reduces the number further to 26.830. Accounting for several other assumptions, reduces the game-tree further. This is called pruning.
A cell in Tic-Tac-Toe can have 3 states. X, O or nothing. There are 9 cells, this means the brute amount of possible boards is 3^9 (19.683).
Chess has 64 cells and 6 different types of pieces in 2 sides, which gives us 13^64 for a whopping 1 960 534 764 307 610 733 306 597 604 235 660 154 244 032 800 041 157 875 895 909 638 422 489 possible board states, and yes I had to post every digit.
Each move is a transition between states, if we were to count all possible transitions for a Chess game of 50 moves, this becomes unwieldy very fast, as in having 3565 digits…
Let’s consider a modern example, the Civilization games. Academics like its Free and Open Source variant, FreeCiv, which is based on Civilization II.
FreeCiv has 50 unit types, 11 different terrains with 2 special resources for a total of 33, these terrains can be upgraded with roads, mines, irrigation or fortresses; for the sake of argument we will assume that they are available for all terrain types (which is not true), a road then can coexist with a mine, a farm or a fortress. That is 33 terrain types with or without roads, pushing it to 66; farms, mines and fortresses can be independently built, and choosing one excludes the simultaneous possibility of the other. This multiplies our current amount of terrain possibilities by 3, to 198.
In addition it features cities, a special tile with combinations of 40 buildings and 28 wonders to choose from. Naïvely counting, all possible building combinations range over 68! and that thing already has 97 digits.
Cities have population, which is another variable we must bear in mind, game population is an integer from 1 to about 42 depending on the city’s conditions. Multiplying city combinations by a “meagre” 42.
Also, to complicate matters even more, units can be stacked. Stacking means two or more units can share the same tile. We know that there are 50 unit types, the same unit types can repeat themselves; so we are not lucky enough to use the factorial function or something similar. Sincerely I don’t know if there’s a limit for stack size. To make it look good and memorable, I’ll use 50. That gives us 50^50 possible stack types.
All of this makes the combinations for a SINGLE tile, about:
(50^50) x (42x68!) x 198 = 1.83177700305001096715e185
Bear in mind that a single tile of this game has sixty trequinquagintillion… (Blah, blah, lots of blah) percent more possible states than the whole chess board. This is not twice, or thrice more moves, it’s a number with 164 digits bigger. A regular Civilization map has 4000 tiles. Rising this even more.
(((50^50)*(42*(68!))*198) ^4000) = 3.09321106705870790023e741051
Continuing the same line of thought as with Chess, a move is a transition between game states. And since these paragraphs are about bloated numbers, we’ll calculate it for a regular game of 480 turns.
((((50^50)*(42*(68!))*198) ^4000) ^480) = 2.4922210999e355704715
This “move-space” is bloated, choked full of illegal moves and pesky nuances, but it paints the picture. It doesn’t count the array of different moves each unit has, or the technologies that can be researched, and in which paths; while completely ignoring diplomacy. Surely, pruning would do a great job at making a feasible game tree here.
Fighting the Infinite Monster
We might as well think about the game tree possibilities as endless. Building a feasible game tree is an exercise in exploring this infinity, making it one of the reasons why it is so hard to design a great game.
Nowadays, most games are not even turn-based, but real-time, and as we have come to expect this complicates matters even more. Analytical solutions suffer even more from the plagues of timing to the intricacies of synchronicity.
I will not get carried away enumerating every bump in the road. Instead, let’s picture a person balancing in a rope, he’s far up in the air carrying a pole. There are hundreds of muscles in motion beneath his skin, all finely handled by his brain’s unconscious software. The stick, however, is his conscious business.
Let’s tie a small weight to one of the ends of the stick, then let’s give him another stick for him to carry, and a weight for that one too. We can keep this going on forever. That is what making a balanced design looks like, it doesn’t matter if you’re designing a game, a motorcycle or a nuclear reactor.
Balance is hard, takes time, and lots of iterations.
This is about games, modern games, so I’ll pick some from the currently well-known roster. The very well-known e-sports of DOTA 2, League of Legends (LoL) and StarCraft 2. These guys need no introduction.
Blizzard really takes its time to balance its games, their reputation for polishing its products is well deserved. The asymmetrical warfare in StarCraft requires lots of testing. Thankfully they are smart enough to include the very best players in competitive StarCraft in their balancing pipeline. Those guys are the receptacles of expert gameplay knowledge that they need.
These fellows at Irvine (California) are also smart enough to know of the curse of dimensionality, thus limiting their unit rosters as much as they can, without compromising variety. We may say they are minimalistic in their approach, and this is not good, it’s actually great.
DOTA is the organic result of Blizzard’s World Editor. Its ancestor, Aeon of Strife, was born in StarCraft I, then it was ported to Warcraft III where a mapmaker named EUL created DOTA itself, EUL stopped development, and left the map for others to tinker with. Later, a mapmaker named Guinsoo introduced some more features, after him, another mapmaker named Icefrog took over, refining it further.
It is no longer a map, but a genre, with its popularity growing ever since, perhaps finding its plateau these days (October 2013), with 550.000 DOTA 2 regular players on Steam, and 5.000.000 concurrent League of Legends players globally, in case you didn’t know League of Legends is the world’s most played game.
Why pick these particular three?
These are one of the most brilliant examples of the success of balance in gaming, and living proof that balance well done makes millions of bucks.
And they have different approaches towards balance…
Blizzard is minimalistic and patient, embracing StarCraft’s nature as an e-sport. Testing LOTS before each release.
DOTA 2 is still migrating from the original DOTA, whose balance kind of “grew in the wilds”, it was very organic.
League of Legends is heavily based on DOTA too but with several notable differences, the biggest, that it changes at a faster pace.
Both MOBA games are constantly being tested, by its enormous user base.
It’s an open question if games suffer from adding too many units. However, it seems that it may be the case, so let’s assume unit proliferation is bad. Then these three games suffer from it, and they have too many units right now. Chess hasn’t seen a new piece in millennia. Go figure why.
At the rate of unit growth for Riot Games’ LoL in 2012 (a new Champion every two and a half weeks) and due to their unique unit mechanics, in some years League of Legends will become one of the most, if not the most, difficult game to balance in known history.
Being crude and personal, sometimes the game feels to be based on having an imbalanced overpowered new “Champion of the Week”, screwing up the meta-game for some weeks until it is hit with the nerf stick, when the addicted fan-base rushes to buy the new one.
I won’t judge Riot, by far and large, its position is the most risky of the three companies, these guys are balancing more in their rope. Blizzard and Valve can fallback in their comfortable structures.
It’s not hard to see that MOBA games are equally, or even more, tactically complex than Civilization games. I invite proud Civilization players to step in and prove me wrong, saving me the time to measure this.
How do they fight the infinite monster?
The question is, how do these three companies fight this infinite game balance monster? Well, previously I mentioned how players are involved in the “fairness” equation as two perfect tic-tac-toe players can never win. There are no draws in these games.
However, there are lots of really big numbers, data and statistics on them. Unit popularity, win/lose rates, unit synergy, etc. I’m only counting what the community has created, surely the companies themselves have even better data on how their games are being played.
What we can see from unit balance is tied to the concept of a unit’s utility, its power on the battlefield. Even Chess has “tiers” for its units, ranking them with numerical values according to their power and importance.
Units are worked on, based on what their perceived power is. If a unit is too popular and its strategy very powerful indeed, it diminishes the game’s novelty. That unit is a candidate to be hit with the nerf bat, obviously, this means reducing its power.
Nerfing a unit has the drawback, not only of pissing off its users, but of usually being too extreme. A unit can be easily nerfed into oblivion, you can nerf a unit once, but it can still remain popular due to the skill of its users, making you nerf it again and again, for the same “novelty” reasons, until it simply becomes useless.
Units ruined by nerfing still maintain loyal players who are not obsessed about competitive efficiency. To bounce a bit these players self-esteem, many times a unit gets reworked.
Reworking a unit means re-doing its mechanics almost from scratch. Many League of Legends players have seen this “Same look, different unit” thing, reworking usually means giving some power to the said unit; a diplomatic way of saying (lying) “We did not screw up”. Instead of reversing the nerf, they just introduce a new unit.
When not even reworking or nerfing have success, the extreme measure of removal gets played. StarCraft usually does this during its extensive testing, many unit concepts that have been shown over the years don’t make it to the final game. Even so, when they make it, they get featured only in the campaign, and not in competitive play.
So far, I’ve described three ways in which companies balance their game units, Nerfing, Reworking and Removal.
There are more ways to add balance without caring about “general” unit power, and that is caring about “specific” power. Rock crushes Scissors, Scissors cut Paper, Paper disproves Spock, etc…
This is a classic in medieval warfare games, infantry beats cavalry, cavalry beats archers, and archers beat infantry. To spice it up, different variations of cavalry, infantry and archers are introduced, but more or less it follows that pattern, X
The specifics get even more painful when we consider the vast number of different moves each unit has. Chess has only “move” and “eat”, with positions for each. StarCraft is similar, adding variation to its “eating” and “moving” options. DOTA has its heroes’ abilities, League of Legends plays with abilities some more, adding unique mechanics that DOTA does not have.
An overlooked but very present element of game tree balance, is the battlefield itself. In strategy games, no one wanted to innovate it seems, as the old age formula of symmetry gets played most of the time. Tic-tac-toe, checkers, chess and every modern game named here (except Civilization) is played on very symmetrical maps.
First person shooters tend to have different philosophies regarding maps, Valve’s Team Fortress 2 being an excellent example in balanced game design. However, this is out of the scope of this writing.
Recapping, we have nerfing, reworking, removal, rock-paper-scissors and board layout.
If I’m not missing anything, there is a last, emergent component of game tree balance: Unit-to-unit cooperative interaction, synergy. This is tanks, bunkers and anti-air turrets in StarCraft; disablers and nukers in DOTA 2 and LoL. Its two or more different unit-types complementing each other beautifully.
Given all the high-dimensionality that these games already have, this… is a mess. Knighting the cause to simplify this chaos, MOBA players have come up with roles, such as tanking, ganking, pushing, supporting, carrying, etc.
Tanks soak up damage, pushers are good at advancing “lanes”, supports heal and improve their teammates capabilities, and carries try to grow real fast in order to create an advantage and “carry” their team to victory. There’s a myriad of variations for each role, and unique combinations between units, giving spice to the MOBA dish.
Surely it helps to have roles defined, but still, many a purposefully designed tank has been found to be an excellent spell-caster or fighter. Much to the surprise (or dismay) of the developer. The interactions between units are very complex, much more when we add the sheer size of possible item combinations for each.
Games are becoming increasingly complex and balance is incredibly important for them. For the great mess that balancing is, the ways of fighting it are not many in breadth, yet potentially infinite in depth.
The mantra for balancing among leading companies is not other than: “Test, Feedback, Test… ad infinitum”