A Markovian nursery game

Die Gewinnwahrscheinlichkeit

I love playing casual board games with my family. Our current favourite is HABA's Erster Obstgarten, which is designed collaboratively for two-year-olds. It's only natural to assume that it must be trivially easy to win. However, after some unlucky rounds, someone turned to the physicist in the room and asked for the exact win probability1. Staring at the wooden fruits and watching the raven advance along its track, I could only answer honestly that it is not a trivial calculation2.

Given the game's initial state3, what is the probability that the players will collect all the fruits before the raven reaches the orchard? I thought about this question for quite some time and decided to write a post about the mathematical and computational insights I gained.

A Markov decision process

The game is obviously a stochastic process, since each die roll changes the state of the board. If every roll determined the next move uniquely, we could model it as a Markov chain. Erster Obstgarten, however, contains one decision: when the die shows the basket, the players may choose which tree to harvest from. That single choice turns the problem into a Markov decision process (MDP).

A state consists of two pieces of information:

  • the raven position \(r\in\{0,\ldots,r_{max}\}\)
  • the fruit counts \(\mathbf{f}=(f_0, \ldots,f_{t−1})\), where each \(f_i\in\{0,\ldots,f_{max}\}\) and \(t\) is the number of trees.

From any state, every outcome is monotone because either a fruit count decreases, or the raven advances. Therefore the state graph is a DAG (apart from dead rolls on empty trees, which we will eliminate algebraically). This acyclicity is the key: it means we can compute win probabilities by dynamic programming over a finite lattice of states instead of sampling with Monte Carlo. With the state fixed, the remaining task is to write a recursion for the win probability from any state.

The Bellman equation

Let \(V(\mathbf{f}, r)\) be the probability of winning from state \((\mathbf{f}, r)\) under optimal play. The boundary conditions are immediate:

  • Victory: \(V(\mathbf{0}, r) = 1\) for all \(r\)
  • Defeat: \(V(\mathbf{f}, r_{max}) = 0\) for all \(\mathbf{f}\neq\mathbf{0}\)

From any intermediate state there are three kinds of outcomes on the \(t+2\)-sided die:

  • Raven: advance the raven, \((\mathbf f, r) \to (\mathbf f, r+1)\).
  • Color: harvest from tree \(i\) if possible, \((\mathbf f, r) \to (\mathbf f-\mathbf e_i, r)\) when \(f_i>0\), otherwise is a dead roll.
  • Basket: choose one non-empty tree \(j\) and harvest it, \((\mathbf f, r) \to (\mathbf f-\mathbf e_j, r)\).

Taking the expectation4 over the uniform die and maximizing only in the basket case gives the Bellman equation:

\begin{equation*} V(\mathbf f,r)=\frac1{t+2}\Bigg[ V(\mathbf f,r+1) +\sum_{i:f_i>0}V(\mathbf f-\mathbf e_i,r) +D(\mathbf f)V(\mathbf f,r) +\max_{j:f_j>0}V(\mathbf f-\mathbf e_j,r) \Bigg]. \end{equation*}

Here \(\mathbf e_i\) is the \(i\)-th standard basis vector and \(D(\mathbf{f})\) is the number of empty trees in state \(\mathbf f\), i.e. the count of indices with \(f_i = 0\). The only control in the system is the basket term; everything else is dictated by the die.

The equation above still contains the major annoyance of the dead rolls. If the die shows the color of an empty tree, nothing happens and the state stays \((\mathbf{f}, r)\). This appears as self-dependence, because some outcomes contribute \(V(\mathbf{f}, r)\) back into the expectation. These faces of the die contribute exactly \(D(\mathbf f)V(\mathbf f,r)/(t+2)\) to the right-hand side. We can move this term to the left and condition on the next roll that actually changes the state, which yields:

\begin{equation*} V(\mathbf f,r) =\frac{V(\mathbf f,r+1) +\sum_{i:f_i>0} V(\mathbf f-\mathbf e_i,r) +\max_{j:f_j>0} V(\mathbf f-\mathbf e_j,r)} {t+2-D(\mathbf f)}. \end{equation*}

This is the same process, just written in a form without self-loops. The denominator is the number of die faces that can actually change the state. At this point the entire computation is fixed except for how we choose the tree in the basket case. In the equations above I used the optimal-play convention, which makes the basket term a \(\max\); in the next section I replace that term to model other policies.

Optimal or toddler?

With dead rolls eliminated, every state \((\mathbf f,r)\) reduces to averaging over active die faces. The only remaining modeling choice is the basket rule. It is convenient to view the numerator as a sum of future values associated with harvesting from each active tree. If we write

\begin{equation*} S(\mathbf f,r) = V(\mathbf f,r+1) + \sum_{i:f_i>0} V(\mathbf f-\mathbf e_i,r), \end{equation*}

then the simplified recursion from the previous section is

\begin{equation*} V(\mathbf f,r) = \frac{S(\mathbf f,r) + V_b(\mathbf f,r)}{t+2-D(\mathbf f)}, \end{equation*}

where \(V_b(\mathbf f,r)\) encodes the basket policy. Under optimal play the basket should choose the tree that maximizes the future win probability:

\begin{equation*} V_b(\mathbf f,r) = \max_{j:f_j>0} V(\mathbf f-\mathbf e_j,r). \end{equation*}

If we instead model a toddler as choosing uniformly among the \(A(\mathbf f)=t-D(\mathbf{f})\) active trees, the basket term becomes an average, and our decision process collapses back into an absorbing Markov chain:

\begin{equation*} V_b(\mathbf f,r) = \frac{1}{A(\mathbf f)}\sum_{j:f_j>0} V(\mathbf f-\mathbf e_j,r). \end{equation*}

Equivalently, every active tree’s future value gets a multiplier of \(1+\tfrac{1}{A}\): one unit from its color face, plus an extra \(\tfrac{1}{A}\) from the basket being distributed evenly.

Four lines of BQN

With the recurrence fully specified, all that remains is to evaluate it exactly over the finite state lattice5. The monotonicity we observed earlier means we can compute \(V\) bottom-up. In a scalar implementation one would index neighbors one by one to fetch the values \(V(\mathbf f-\mathbf e_i,r)\). But in BQN we have powerful rank polymorphism and shifting functions!

My implementation clones the lattice \(t\) times (one per tree), and shifts at rank \(i\) to model subtracting \(\mathbf e_i\). The function » pads vacated entries with fills, which is exactly what we want: states beyond the boundary (e.g. trying to harvest from an empty tree) contribute nothing. We seed the absorbing victory boundary by setting \(V(\mathbf 0,r)=1\) and then repeatedly apply the dead-roll-normalized transition to propagate values outward across the lattice.

The policy functions discussed in the previous section become:

O ← {+´+·+´((<⊑∘⍒¨𝕩)=⌽↕=𝕩)⊸×}
T ← {(1+1÷1⌈+´¨0<𝕩)×+´}

Which can be nicely plugged into the driver 1-modifier:

_eog ← {𝔽_𝕣 t‿f‿r:
  p‿d‿v‿s ← ⟨1⌾⊑0¨, (t+2)-·+´¨0⊸=, 𝔽, +´¨⟩{𝕎𝕩}¨<↕t⥊f+1
  ⊢´⥊{p𝕏{𝕩+(𝕨=s)×(d÷˜𝕗+V)𝕩{»⎉𝕩𝕗}¨1+↕t}´-⟜↕t×f}⍟r 0¨s
}

Since the raven advances strictly forward, it acts as a discrete arrow of time. Instead of a massive rank 5 tensor, we allocate only the spatial grid and (implicitly) evolve it backward in time with ⍟r. Because states with the exact same fruit sum cannot transition into each other, they are mutually independent. This allows us to apply our finite difference stencil to the entire universe simultaneously, using the boolean mask 𝕨=s to lock in only the probabilities for the current active wavefront. Finally, the completed probability tensor from the previous raven step is injected into the inner loop as an operand 𝕏→𝕗 to supply the \(V(\mathbf{f},r+1)\) term. Running the standard setup (four trees, four fruits each, raven track length five) takes microseconds in CBQN:

O‿T {𝕎_eog𝕩}¨ <4‿4‿5
⟨ 0.6313573066006354 0.596632015719557 ⟩

The Bellman-optimal strategy does improve the odds, but only slightly. So much for a trivial nursery game.

Footnotes:

1

That's what the section title says in German. Try saying it out loud three times.

2

The game is very popular in German-speaking countries, and it is known as "First Orchard" in English. As I suspected, calculating the true odds of winning is hard enough to inspire papers and videos.

3

The mechanics are simple: there are \(t\) trees, each starting with \(f_{max}\) fruits. Players take turns rolling a \(t+2\)-sided die. Rolling one of the \(t\) colors lets you harvest a fruit from the matching tree (if the tree is empty, nothing happens). Rolling the side basket lets the players collaboratively pick any available fruit. Finally, rolling the side raven advances a wooden bird one step down a path of \(r_{max}\) steps. If you collect all \(f_{max}\times t\) fruits before the raven reaches the orchard, everyone wins. If the raven arrives first, too bad.

4

Without the basket choice, evaluating the next turn is simply a passive weighted average via the Law of total probability. Injecting the max operator to account for the players' optimal decision is exactly what transforms this passive probability into the Bellman optimality equation.

5

Finally some code which I will present as if it didn't take ages to refine!