Sky's stuff

PR does not measure playing strength

At time of writing, the standard for measuring playing strength among backgammon players is PR. PR is defined as the average equity lost per decision, except "obvious" ones, per XtremeGammon (XG), where equity is a somewhat strange number not particularly closely related to Match Winning Chances (MWC).

This metric is taken very seriously; for example, the winner of the Ultimate Backgammon Championship (UBC) is determined in large part by PR. Thousands of euros have changed hands because of (sometimes very small) differences in PR between two players.

In this article, I will argue that PR does not measure playing strength, and suggest another metric (ΔMWC/decision) which measures playing strength better. Despite my ruthless assault on PR, I should make clear that I do not think it should be replaced. Allow me to say a few things very much in its favour:

Players with low PRs tend to be stronger than players with high PRs.
A player with a PR of less than 4 - the BMAB's criterion for Grandmaster status - is certainly very strong.
Moves which lose a large amount of equity per XG (say, 200 millipoints) are almost always incorrect, and usually blunders.

In other words, PR is a useful metric, and even has some correlation with playing strength.

Before I justify the title of this article, I shall have to clarify what I mean when I say "playing strength".

A good player wins matches

Yes, that's pretty much it. The only goal in backgammon is to win the match, so a metric for playing strength should simply be higher (or lower, as with PR) the greater your tendency to win.

If there was no luck involved in backgammon, it would be pretty easy to measure playing strength: it would be sensible to use an Elo system, like chess. Elo still works for backgammon, but because of luck, it settles slower and settles less, so it's not an ideal metric either.

Playing strength depends (solely) on the probability that a player wins a match. This criterion is incomplete: one's winning odds depend on their opponent and the match length. Let us assume that their opponent plays perfectly (left undefined here, but in practice this will mean assuming their opponent is XG, as it is for PR). As for match length, we can't include that in our computation if we want a metric which is agnostic to match length, so let's just measure playing strength per move.

To summarise, then: I think the best metric for playing strength would be the average MWC lost per move, assuming a perfect opponent. Oh, and we won't count forced moves, since they don't represent a meaningful decision from a player. Hang on, isn't that just PR?

This is most definitely not PR

For the remainder of the article, I will explain two differences between this metric - let's call it WR, or "win rating" - and PR, along with examples of good plays which are punished by PR because of each difference.

PR does not define equity as MWC

This is a simple one, but also probably the most egregious. PR, by design, does not measure equity loss as MWC loss. In particular, it scales every game's equity so that a win is worth 1.000, a gammon is worth 2.000, and a backgammon is worth 3.000 - regardless of how high the cube goes. This is for an understandable reason: without this scaling factor, some games would be much less impactful to PR than others, which reduces those games' potential for skill expression.

There's only one problem with fixing this: it's not broken. It is simply a fact of the game, which good players can acknowledge in order to win more matches. The example here is clear: a good player will save their clock time and energy (and study) for important decisions - that is, decisions where a large amount of MWC are on the line - and thus play with a weaker PR in unimportant situations. Since such a player is giving up less MWC per decision than a player who does not adjust in this way, they will play with a lower WR, but PR intentionally punishes unimportant plays disproportionately.

PR ignores some decisions

PR does not actually count some decisions which are supposedly obvious, unless you get them wrong. The definition of 'obvious' here is that the second best play would lose a great deal of equity. Setting aside for a moment that the previous paragraph undermines this notion of equity, and that there is no clear reason to believe in this definition of obviousness (PR rewards you for an opening 8/5 6/5), the issue here is that those decisions do reflect a player's strength, so playing them correctly should result in a lower rating. The justification for filtering these out is either that these moves are not meaningful decisions at all (but again, PR treats 8/5 6/5 as a meaningful decision), or that since these decisions do not differentiate between strong and very strong players, we do not want them to be included in the PR calculations, where they'll bring those players' PRs closer to zero and closer together. I sympathise with the latter, but it seems to me that compromising the relationship between MWC and PR is not the solution: rather, reading PRs (or WRs) more responsibly is.

For an example of when this becomes problematic, suppose a player plays a mildly inaccurate opening move (say 13/11 6/5 at a score where 13/11 24/23 is better) because they think their playing style tends to be favoured in the ensuing game. Their theory is borne out: in the ensuing game, many of their decisions are perfect, but classified as 'obvious' for the purposes of PR, and are not counted. The result is that although the player employed a cunning opening trick which improved their MWC - and indeed would've improved their PR if only the easy decisions they made for themselves were counted - their PR is actually a good deal higher than it would otherwise have been, as the inaccuracy at the start is diluted by fewer moves in the calculation.

For a simpler example, suppose a player plays three obvious moves correctly, and one obvious move incorrectly. PR computes this as completely equivalent to just playing one obvious move incorrectly. Clearly, in this case, the justification that the decisions are not meaningful fails, as does the justification that including these decisions compresses PRs towards zero, since including them would differentiate between these two cases.

That said, PR really isn't so bad

Objectively speaking, PR doesn't do a perfect job of measuring how good a player is at winning backgammon matches, but then, nor does any simple numerical metric. While I think MWC lost per move is a better measure of playing strength, I don't think WR is the superior metric overall.

Not to be understated is the fact that PR has, for better or worse, been adopted by a large part of the community, and us backgammon players ought to treasure every point of agreement we can find, such is the state of the community. PR does an awful lot more good than harm, not least by making relatively unimportant decisions more engaging to watch at home, and for that reason alone, I wouldn't dream of replacing it.

But - I might look into calculating MWC lost per decision for some top players, and publish what I find!