Strips from the BQNcrate
The BQNcrate is an useful tool for learning and honing BQN skills, though not all of its entries are easy to understand. Personally, I always prefer a shorter implementation if it's possible to maintain the same time complexity. For this post, I have chosen three examples that fall into that specific category of code where you can read every primitive, know exactly what each one does, but still miss the emergent behavior of the composition.
Table of Contents
Topological sort
Seeing this implementation of Kahn's algorithm for the first time is not for the faint of heart:
TC β {{πβ(π©<ββ β’)β(π©βΎΒ·/π¨βΈ<)π¨β¨β§Β΄βββπ¨Β¨p}β/0Β¨pβπ©}
To understand dense, beautiful code like this, a good first step is to refactor it into smaller block functions and named variables. Let's deconstruct the magic:
TCS β {πgraph: resultβΏmask β β¨β© β 0Β¨graph Sort β {mπr: n β mβ¨ready β {β§Β΄π©βm}Β¨graph n πβ(π©βΈβ’) rβΎ/m<ready } mask Sort result }
Let's verify our refactoring is equivalent to the original. The input for both functions is a list of lists representing the dependency graph: each index is a task, and its corresponding element is a list of its prerequisites. For example:
graph β β¨β©βΏβ¨β©βΏβ¨0β©βΏβ¨0,1β©βΏβ¨2,3β©βΏβ¨4β©βΏβ¨4β©βΏβ¨1,5,6β©
And then:
(TCβ‘TCS) graph
1
Looks good. So, how does this more "readable" version work? The algorithm's state is propagated through two variables:
a "can-be-built" boolean mask m
, initially all false, and the final sorted list s
,
initially empty (indices /
of a zeros' list is an empty list).
The Sort
function then begins its iterative process. Inside, it first identifies the ready nodes: those whose
prerequisites are all satisfied by the current mask m
. In the first step, this naturally selects nodes with
no prerequisites β¨β©
, which unlocks the next layer of dependencies. The new "can-be-built" mask n
is the union
of the old mask and the newly ready nodes, ensuring it grows monotonically.
Crucially, the result list is extended only with the nodes that just became available. This new layer is
isolated with the differential comparison m<n
(m<ready
also works, since nβ‘mβ¨ready
), and their
indices /
are appended to the result before recurring with the updated state. The process repeats until a pass yields
no new nodes, completing the sort.
The _βwhile_ modifier
This one was infamously hard for me to grasp, and reading its docs didn't clarify much at first. To understand it, you need to be familiar with a fair bit of BQN, especially the functional programming and combinators aspects. What makes it so complicated is the use of modifier recursion, which I will try to break down here.
An unrolling of the first two steps reveals that up to 2βn
evaluations of π½
can occur at recursion
level n
. This is derived by noting that, within the BQN combinator, the left function of the
rightmost atop dictates that the π½
for the subsequent step is, in accordance to π½_π£_πΎ
:
_w0_ β {π½βπΎβπ½_π£_πΎβπ½βπΎπ©} _w1_ β {(π½βπΎβπ½)βπΎβ(π½βπΎβπ½)_w0_πΎβ(π½βπΎβπ½)βπΎπ©} _w2_ β {((π½βπΎβπ½)βπΎβ(π½βπΎβπ½))βπΎβ((π½βπΎβπ½)βπΎβ(π½βπΎβπ½))_w0_πΎβ((π½βπΎβπ½)βπΎβ(π½βπΎβπ½))βπΎπ©}
Another way to clarify the concept is to implement the same logic both as a function and as a 1-modifier, and then compare these implementations with the two 2-modifiers (one exhibiting linear and the other a logarithmic number of stack frames):
Whiles β {FβΏGππ©: Wfun β {πβGβπΛβΈπβπβGπ©} _wom β {π½βGβπ½_π£βπ½βGπ©} _wtmlog_ β {π½βπΎβπ½_π£_πΎβπ½βπΎπ©} _wtmlin_ β {πβπ½βπΎπ©} β¨f Wfun π©, f _wom π©, f _wtmlog_ g π©, f _wtmlin_ g β"SO"π©β© }
Letβs test it with a simple iteration that exceeds CBQNβs recursion limit, triggering a stack overflow:
β¨1βΈ+, 5000βΈβ₯β© Whiles 0
β¨ 5001 5001 5001 "SO" β©
Levenshtein distance
The Levenshtein (or edit) distance is a measure of the similarity between two strings. It is defined by the following recurrence, which is the basis of dynamic programming algorithms like Wagner-Fisher:
\begin{align*} d_{i0} &= i, \quad d_{0j} = j, \\ d_{ij} &= \min \begin{cases} d_{i-1,j-1} + \mathbf{1}_{s_i \neq t_j} \\ d_{i-1,j} + 1 \\ d_{i,j-1} + 1 \end{cases} \end{align*}The Wagner-Fisher variant in the BQNcrate can be derived by shifting the distance matrix. Given two strings \(s\) and \(t\) of lengths \(n\) and \(m\), respectively, we define a new distance matrix as follows:
\begin{equation*} p_{ij} = d_{ij} + n - i + m - j \end{equation*}Under this transformation, the recurrence relation becomes:
\begin{align*} p_{i0} &= p_{0j} = m + n, \\ p_{ij} &= \min \begin{cases} p_{i-1,j-1} + \mathbf{1}_{s_i \neq t_j} - 2 \\ p_{i-1,j} \\ p_{i,j-1} \end{cases} \end{align*}The above recurrence can be easily identified in the 3-train's middle function, which is folded over the table of the costs (table comparing the characters). Note that we compare insertions and substitutions, and then we can do a min scan over the result to get the deletions, which gives a vectorised implementation.
An interesting part of this solution is the construction of the cost table,
which is done by reversing \(t\). Given that the final result for \(p_{ij}\) β is located
in the bottom-right corner and we use foldr
, I would have expected \(s\) to be the
one reversed instead. However, both methods are functionally equivalent, as the following
stochastic test suggests:
_l β {Β―1β(1βΈ+β₯+)ββ (β`β’βββΈΒ»ββ’-0βΎ1+β£)Λπ½} T β β½βΈ(=β)_lβ‘=βββ½_l Tβ{@+97+π©β’rand.Range 25}Β΄ 1e4βΏ1e5
1
The reason both approaches work lies in the inherent symmetries of the Levenshtein distance:
- \(L(s,t) = L(t,s)\)
- \(L(s,t) = L(\text{rev}(s),\text{rev}(t))\)
- \(L(\text{rev}(s),t) = L(s,\text{rev}(t))\)