Game Design: What the hell is game balance?

Please pardon the provocative title. There is a bit of debate about what exactly is game balance. Balance is a subjective assessment, similar to fun, but while balance may be subjective, there are concrete measurements that can be applied to determine how closely a design aligns to that subjective goal. In a word: metrics.

For myself, I want a player to feel like their skill and perhaps some luck, contributed to their victory or defeat. I hate the feeling of realizing that something in a game is even moderately unbalanced. If the game is severely unbalanced, it reduces the game to an activity.

For example, if a game has players select from unique abilities at the start (like a faction), but you realize that your faction is unbalanced, the game is no longer about skill and luck, but rather the choice you make at the beginning of the game. In which case, why play the rest of the game?

Game balance can be a vague goal, because it isn’t concretely designed. It means different things to different people. For myself, I consider a game balanced if each player of a similar skill level has an equal chance of winning. When people say a game is ‘unbalanced’, but without defining what they mean by 'balance' it can come off as simple dislike of a game. For this article, when I say ‘balanced’ I will be referring to my own definition of balance.

There are a number of ways to determine if a game is balanced. Most often, balance is assessed through playtests. A game is played over and over and the designer or developer observes and makes a judgement about the winner. This process is popular and commonly used and is a necessary part of game development. However, game balance can and should be evaluated before a paper prototype is even made.

Metrics

In order to do this, a designer must determine which metrics will be used to evaluate the game. What is a metric? By metric, I simply mean a measuring point. That is the tricky part though. A game can have many, many points to measure. The more complex a game, the more points there will be to measure.

The use of metrics is very important to asymetrical games, but can still be used in symmetrical games. For example, all players can take the same actions in a game of Settlers of Catan, but are those actions all of equal value? Should they be? Simply using the metrics to measure the impact to the game state doesn’t require that they all have the same value. But by observing what the values are, a designer can make informed choices. Maybe that is why drawing a card costs a wheat, a sheep, and an ore.

As an example, we can look at an Imperial Assault style game. Each player activates a figure or group of figures. Each figure has a number of Health Points and each figure can deal an amount of Damage. We have two metrics. So, if we were evaluating the game for balance, we would track HP and Damage and maybe add them up to a single value:

Figure 1: Figure balance example

Fig	HP	Damage	Total
Figure A	10	1-6	16
Figure B	10	1-6	16

A quick note about the ‘total’ column. This column can be anything. You can modify the values in other columns based on their importance (such as doubling an import value). The goal is just to have some number that can be compared to another, that gives you information about the relationship between the two figures. Is one stronger than the other? Is one too good? A designer could even use that ‘total’ value as a basis for the figure’s cost, in a game where one player buys their figures.

Special Abilities and Asymmetry

So, looking at the two figures, we can say they are balanced because they both total to 16. Of course they are balanced, because they are symmetrical. If all games were this simple, balancing them would be a breeze.

Yet you can find games on the market, where just adding up a figures stats shows that Figure A is better than Figure B (Mansions of Madness, 1st ed.). Often games give figures special abilities, which may be harder to apply metrics to. Most often because a special ability will impact one aspect of the game, like movement.

Figure 2: Figure Move Stat Example

Fig	HP	Damage	Move Ability	Total
Figure A	10	1-6	2	18
Figure B	10	1-6	0	16

So compare a movement metric for one ability may provide some value, but if it is compared to an ability that boosts damage, then a movement metric will be 0. Now Figure A looks better than B, but suppose B has a damage ability? Will the two figures be balanced?

What if one of the abilities has some sort of limiter, so that it only applies in certain situations? If that situation is rare, then to keep balance, the bonus should be much larger than the bonus from an ability that is always applied. As you can see, there are several more metrics we could measure, such as figure movement (speed) and the range at which they can inflict damage.

A quick note on special abilities and metrics. A way to determine the value of a special ability is to use metrics on the actions in a game. For example, if a game allows players to place a cube, you could give it a value of 1. So, the action to remove a cube could also be valued at 1. So, if you have a special ability to replace a cube with your own, the value should obviously be 2. It is simply the sum or adding a cube and removing a cube.

Multiple Figures

Lets step back and look at a little more complex example that I think a lot of dungeon crawling games fall into. A hero player will get one activation, but the Overlord gets to activate a group of figures. So, all things being equal, we might expect a group of figures to inflict a fraction of the other figure:

Figure 3.1: Luke and Stormtrooper Example

Fig	HP	Damage	Figures	Total
Luke	12	1-6	1	(18 x 1) 18
Stormtroopers	4	1-2	3	(6 x 3) 18

So, we can call these Stormtroopers balanced right? They both add up to 18, so we are good to go… or are we? Lets do some hypotheticals. We’ll draw up a table and measure how much damage is being done over 3 rounds:

Figure 3.2: Max Damage Over Time

Round	Luke Damage	Trooper A Damage	Trooper B Damage	Trooper C Damage
1st	6 (8 HP Left)	2 (4 HP)	2 (4 HP)	0 (0 HP)
2nd	6 (6 HP left)	2 (4 HP)	0 (0 HP)	0
3rd	6 (6 HP left)	0 (0 HP)	0	0

This table assumes that Luke shoots first and takes out a Trooper each round before they activate and attack. It also assumes max damage. Based on this table, we might assume that Luke is OP. That he will win every time and have half his life left. The damage ratio is 1:1 each round.

So, we might beef up the Trooper or even give the Overlord another set of figures. Great, job done! But, let’s look at another hypothetical quickly before we make any changes:

Figure 3.3: Min Damage Over Time

Round	Luke Damage	Trooper A Damage	Trooper B Damage	Trooper C Damage
1st	1 (9 HP Left)	1 (4 HP)	1 (4 HP)	1 (3 HP)
2nd	1 (6 HP Left)	1 (4 HP)	1 (4 HP)	1 (2 HP)
3rd	1 (3 HP Left)	1 (4 HP)	1 (4 HP)	1 (1 HP)

This table assumes that Luke goes first again, but only minimum damage is rolled. Luke is clearly doomed in this scenario. There is literally nothing he can do as the damage ratio each round is 1:3.

What happened? Well, the big issue here (and in many dungeon crawlers) can be shown by a missing metric. The metric is ‘number of activations’. Simply giving one player more figures to activate, gives them an advantage in that the minimum damage is multiplied by the number of activations/figures they have. Looking outside the example, unless Heroes are getting reroll abilities, then the Overlord is getting 3 chances to roll max damage (and keeping all damage), while the Hero gets one.

Figure 3.4: Activations Over Time

Rounds	Overlord Activations	Hero Activations
1st	3 (3)	1 (1)
2nd	3 (6)	1 (2)
3rd	3 (9)	1 (3)

As you can see, the Overlord has a pretty big advantage in number of activations. So, if the goal is to balance everything, then those activations should not give them an unfair advantage.

An example is Mansions of Madness 1st Ed. The Overlord received 1 threat (for taking actions) per Investigator. Games with 2 Investigators could be hard for the Overlord to win, due to a shortage of threat, but were very easy with 4 Investigators (twice the threat per round).

Figure 3.5: Mansions of Madness, Threat Over Time

Rounds	2 Hero Threat	3 Hero Threat	4 Hero Threat
1st	2 (2)	3 (3)	4 (4)
2nd	2 (4)	3 (6)	4 (8)
3rd	2 (6)	3 (9)	4 (12)

With a variable cost for actions, some were unavailable until later rounds in a 2 Investigator game, while they could be triggered in round 1 of a 4 Investigator game. The sweet spot was probably 3 Investigators, but by later expansions, in order to fix the 4 Investigators scenario, winning was made even more difficult for the Overlord in 2 and 3 Investigators games.

We can obscure this reality a little by looking at a range of damage for each figure. On average Luke will deal 3.5 damage… but since we don’t track halves, we are rounding up or down. Rounding down will doom Luke while rounding up will let him win handily. Two choices, or rather 50/50 odds. The game is now down to almost entirely luck of the die, hopefully offset by the skill of the players.

Averages and Statistics

Some designers might say this is perfect, let the average rolls handle everything. However, this ignores that an average of rolls is based on a huge number of rolls. A player will not roll the dice enough times in a single game to see real averages.

For the most part, this is probably fine. Most players will be happy with this setup or at least not be able to easily see the breakdown of the game at this level. Particularly after adding in movement, range, maybe special abilities.

However, I want to point out that relying on averages will not give the results you may expect. Let us say you have a ‘skill test’ system, where a player rolls a number of D6 and on a result of 5-6, they pass. So you go through the game and design it with the idea that if a player has 3d6, they will get a success. Using rough math of having a 1 in 3 chance to roll a 5 or a 6.

However, if you run a simulation of die rolls, taking a random roll millions of times, you get a very different story. You will see that your chance of getting a success is actually… 70%. Which is closer to 2 in 3. You want people to have at least a 96% chance of success? Then your players should be rolling 8d6 for a skill test. Beware of averages!

Circling back to the concept of balance and subjectivity, even the game genre can alter a player’s perception of balance. There are gamers who feel a co-operative game should have a win rate of 20-30%. But if you had a game where the Overlord was winning against multiple heros 80% of the time, most players would say the Overlord side was over powered. Assuming similar skill level, I would expect each side to win 50% of the time (two teams essentially).

Player Metric Usage

Looking at a game like Pandemic, you can see a metric that is exposed so that players can determine their own difficulty level. The number of epidemic cards will directly change the difficulty of the game, with fewer cards giving an easier level of play.

However, a metric that might not have been considered is the number of players vs cards seen by each player. With two players, each player will see a larger percentage of cards from the player deck. In a four player game, each player will only see 25% of the deck, or half as much as a two player game.

When the game employs a set collection mechanic, the number of cards a player sees will profoundly impact their ability to create sets of cards.

The number of players in a game introduces complexities for balance. Scaling a game can be very difficult and I hope to cover it in a future article.

Conclusion

Any aspect of the game can be measured and recorded in a spreadsheet, but the real difficulty is deciding what are the important metrics to measure. There isn’t a concrete answer, however I think there are two guiding principles.

If something doesn’t need to be measured, don’t measure it. In other words, the values don’t change the balance, is it really needed? Either the attribute being measured doesn’t impact game play or the value is the same for all components. If every figure in the game can only make 1 attack, then it isn’t worth keeping track of ‘number of attacks’.

The second principle, is the difference between two numbers. The example of activations shows how after several turns, there was a large difference between the the total number of activations taken by the Hero and Overlord. At just the third round, the Overlord had taken 6 more activations than the Hero! Metrics that get bigger over time or have a large difference are worth examining.

Players may not be analyzing a game with spreadsheets, but often gain an intuitive sense of balance issues, just from observing game play. By making sure that there is balance in a game, designers are presenting players with valid choices. Few players would be happy that their choice of a particular hero in a dungeon crawl, had already determined if they would win.

Tick Talk

Search This Blog