A coding impromptu

This post is a rolling collection of algorithms and computational ideas I like, implemented in BQN. The current selection includes:

Table of Contents

The categories above are more hints than strict taxonomies. Occasionally, an entry here grows enough to deserve its own dedicated post, leaving behind only a link or a memory. Extrapolating Perlis' remark1, it's likely that a group of 50 individuals would devise 35 to 40 distinct solutions to even the simplest problem in BQN. Therefore, I will sometimes juxtapose my implementations with those of seasoned BQNators2.

Competitive programming

Problems I find in coding challenges' platforms, such as LeetCode, CSES, cp-algorithms, and Advent of Code.

Trapping rain water

This is a classical interview problem that can be solved in linear time. Interestingly, it admits a very elegant array solution:

(+´⊢-˜⌈`⌾⌽⌊⌈`) [0,1,0,2,1,0,1,3,2,1,2,1]
6

That is, we take the minimum of max-scans from the left and from the right, and subtract the corresponding height. Reducing the resulting array gives the amount of trapped water.

A closely related problem is container with most water, which unfortunately is not so easy to solve in linear time using an array approach (one can easily implement the imperative two pointers solution in BQN, but it will probably be slow). Here are two solutions, one \(O(n^2)\) and the other \(O(n\log n)\), both tacit:

⟨⌈´∘⥊⌊⌜˜×·-⌜˜⊒˜, ⌈´∨×(⌈`⊸-⌈⊢-⌊`)∘⍒⟩ {10 𝕎•_timed𝕩}¨< •rand.Range˜1e4
⟨ 0.080050875 4.14558e¯5 ⟩

Z algorithm

This is a very efficient procedure that finds prefix strings in linear time. The imperative implementation reads:

ZI ← {𝕊s:
  l‿r‿z ← 0⚇0 0‿0‿s
  z ⊣ {
    v ← r(⊢1⊸+•_while_{(𝕩+𝕨)<≠s ? =´⟨𝕩,𝕩+𝕨⟩⊑¨<s ; 0}<◶({z⊑˜𝕩-l}⌊-+1)‿0)𝕩
    r <◶@‿{𝕊: l↩𝕩-v+1 ⋄ r↩𝕩} 𝕩+v-1
    z v⌾(𝕩⊸⊑)↩
  }¨ ↕≠s
}
ZI "abacabadabacaba"
⟨ 15 0 1 0 3 0 1 0 7 0 1 0 3 0 1 ⟩

Two algorithmic improvements can be made here, namely only iterate over indices where the character found is equal to the first character, and only search to extend the count if it goes up to the end of r:

ZFun ← {𝕊s:
  CountEq ← { 1⊸+•_while_((≠𝕨)⊸≤◶⟨⊑⟜𝕨≡⊑⟜𝕩,0⟩) 0 }
  l←r←0 ⋄ Ulr ← {(r⌈↩𝕨+𝕩)>r ? l↩𝕨 ⋄ 𝕩; 𝕩}
  SearchEq ← ⊣ Ulr ⊢ + + CountEq○(↓⟜s) ⊢
  Set ← {i𝕊𝕩: ((r-i) (i SearchEq 0⌈⊣)⍟≤ (i-l)⊑𝕩)⌾(i⊸⊑) 𝕩 }
  (⌽1↓/⊑⊸=s) Set´˜ ↑˜≠s
}

I came up with two array versions, with quadratic and cubic time complexities respectively:

ZAQ ← ¯1↓↓(+´·∧`⊣=≠⊸↑)¨<
ZAC ← (+´∧`)¨<=↕∘≠{«⍟𝕨𝕩}⌜<
(ZAQ≡ZAC)◶@‿ZAC "abacabadabacaba"
⟨ 15 0 1 0 3 0 1 0 7 0 1 0 3 0 1 ⟩

With further refinements, the earlier solutions can be transformed into:

ZAQ‿ZAC ← {(+´∧`)¨𝕏}¨ ⟨≠↑↓=⌽∘↑, <=«⍟(↕∘≠)⟩

Longest increasing sub-sequence

This problem can be solved in \(O(n\log n)\) using dynamic programming. Here is an imperative implementation which is quadratic, but can be optimized:

LISI ← {
  k‿dp ← ¯1‿(∞¨𝕩)
  {i ← ∧´◶(⊑⊐⟜0)‿{𝕊:k+↩1} dp<𝕩 ⋄ dp 𝕩⌾(i⊸⊑)↩}¨ 𝕩
  +´∞>dp
}
LISI¨ ⟨0‿1‿0‿3‿2‿3, 10‿9‿2‿5‿3‿7‿101‿18, 7‿7‿7‿7‿7⟩
⟨ 4 4 1 ⟩

A more elegant array solution is:

LISA ← +´∞≠∞¨{𝕨⌾((⊑𝕩⍋𝕨-1)⊸⊑)𝕩}´⌽
LISA¨ ⟨0‿1‿0‿3‿2‿3, 10‿9‿2‿5‿3‿7‿101‿18, 7‿7‿7‿7‿7⟩
⟨ 4 4 1 ⟩

Let's )explain this optimized version, so we can truly appreciate its beauty:

 +´∞≠∞¨{𝕨⌾((⊑𝕩⍋𝕨-1)⊸⊑)𝕩}´⌽ 
 │ │ │ ││    │ │ │  │ │  │ 
 │ │ │ {┼────┼─┼─┼──┼─┼─´│ 
 │ │ ∞¨ │    │ │ │  │ │ ││ 
 │ │  └─┼────┼─┼─┼──┼─┼─┼⌽ 
 │ ∞≠───┼────┼─┼─┼──┼─┼─┘  
 +´ │   │    │ │ │  │ │    
  └─┘   │    │ │ │  │ │    
        │    │ 𝕨-1  │ │    
        │    𝕩⍋─┘   │ │    
        │   ⊑─┘     │ │    
        │   └──────⊸⊑ │    
        𝕨⌾─────────┘  │    
         ├────────────𝕩    
╶────────┘

The full expression is structured as a two-train: we sum all finite entries from the result of the rightmost three-train. The three-train is a right fold over the reversed input, with an initial array of and the same length as the input. In each step of the fold, we modify the right argument using under: we perform a binary search with strict comparison to find where the next element should go. The element is either placed at the end of the unfilled region, or it replaces the first element that is greater than 𝕨. Since BQN uses a based array model, we pick the enclosed atom from this operation's result. So it goes3.

N-queens problem

This problem is the archetypal example of backtracking. Initially, I tried to solve it using a function to place the queens in the full board, hoping that it would lead to a more array oriented solution:

8 {((∨⌜´0⊸=)∨(0=-⌜´)∨0=+⌜´) 𝕩-¨<↕𝕨} 2‿3
┌─                 
╵ 0 1 0 1 0 1 0 0  
  0 0 1 1 1 0 0 0  
  1 1 1 1 1 1 1 1  
  0 0 1 1 1 0 0 0  
  0 1 0 1 0 1 0 0  
  1 0 0 1 0 0 1 0  
  0 0 0 1 0 0 0 1  
  0 0 0 1 0 0 0 0  
                  ┘

This resulted in a more complicated algorithm, so I decided to go for the classical Wirth implementation:

NQ ← {𝕊n:
  V‿P ← {⊣𝕏(⊢∾-⋈+)´∘⊢}¨ ⟨∨´⊑¨˜, {1⌾(𝕩⊸⊑)𝕨}¨⟩
  {n≠𝕩 ? +´(𝕨V⊢)◶⟨(𝕩+1)𝕊˜𝕨P⊢,0⟩∘(𝕩⋈⊢)¨ ↕n ; 1
  }˜´ (0⋈0×·↕¨⊢∾·⋈˜+˜)n 
}

Which nicely compares with the OEIS sequence:

a000170 ← 1‿0‿0‿2‿10‿4‿40‿92
a000170 ≡ NQ¨ 1+↕8
1

And of course, in the implementation above I could have used a single array instead of three, but I find the resulting validation and position functions very aesthetic the way they are.

Majority element

The Boyer–Moore algorithm allows for finding the majority element (element that appears more than ⌊𝕩÷2 times in the array) in linear time. If such element exists, then it is equal to the mode of the data, and for this task we have a nice array solution. The original implementation could be expressed as:

BM ← {v←0 ⋄ I←⊢⊣=◶{𝕊:v+↩1}‿{𝕊:v-↩1} ⋄ 0{𝕊:v=0}◶⟨I,I˜⊣⟩´𝕩}
BM 6‿1‿3‿1‿3‿3‿4‿3‿3‿5
3

The previous fold tracks the majority element as state, a more elegant approach maintains the number of votes:

BM ← {e←@ ⋄ 0{𝕩=0 ? e↩𝕨⋄1 ; 𝕩+¯1⋆e≢𝕨}´𝕩 ⋄ e}
BM 6‿1‿3‿1‿3‿3‿4‿3‿3‿5
3

H-index

This metric is one of the reasons for the deplorable state of modern academia, and the headaches for outsiders trying to get in. Consider that Peter Higgs has an estimated h-index of only 12. By contrast, a random professor nowadays boasts an h-index ten times as high, and exponentially less impact. Enough of ranting, let's concentrate on finding an elegant way to implement this useless thing:

HL ← (+´∘«⊒˜≤+`⌾⌽)·/⁼≠⊸⌊
HS ← +´∨≥1+⊒˜
(HL≡HS)◶@‿HL 14‿14‿11‿9‿5‿5‿1‿1‿1‿1‿0
5

If someone ever published that much, sorting will eventually be slower:

HL‿HS {𝕎•_timed𝕩}¨< 1e8 •rand.Range 1e3
⟨ 0.083824959 0.21801262700000001 ⟩

A testament to the idea that the simplest solution in BQN is often the most efficient: I initially clip my citations array with {≠¨⊔≠∘𝕩¨⌾(≥⟜≠∘𝕩⊸/)𝕩}, which is just /⁼≠⊸⌊.

Numerical methods

I love numerical analysis. Compact implementations are often helpful to get the big picture.

Solving the cubic equation

This function computes the real roots of an arbitrary cubic equation. Initially, the equation is transformed into its depressed form via an appropriate substitution. Depending on the sign of the discriminant, the roots are then determined using Cardano's method when the discriminant is positive, or Viète’s trigonometric method when it is negative. In the case where the discriminant is zero, the proportionality to the square of the Vandermonde polynomial implies that a repeated root is present, the roots are resolved through direct analytical methods. We have chosen those methods to avoid using complex numbers, which are not yet supported in BQN.

Cub ← {a‿b‿c‿d:
  (b÷3×a)-˜•math{
    𝕩>0 ? +´𝕩(𝕗.Cbrt+⋈-)⟜√˜-q÷2;
    𝕩=0 ? 0⊸=◶⟨¯1‿2‿2÷˜·𝕗.Cbrt×⟜4,3⊸⥊⟩q;
    (2×√-p÷3)×𝕗.Cos(2×π×↕⊸÷3)-˜3÷˜𝕗.Acos(√-3÷p)×1.5×q÷p
  }(27÷˜p⋆3)+4÷˜×˜q←(d÷a)-(27÷˜3⋆˜b÷a)+3÷˜b×a÷˜p←(c÷a)-3÷˜×˜b÷a
}

The above implementation only works for the case where a≢0, it will yield NaN otherwise. Here are some tests for the four possible branches:

Cub¨ ⟨1‿0‿¯7‿6, 1‿¯1‿¯8‿12, 1‿¯6‿12‿¯8, 1‿3‿0‿¯1⟩ 
⟨ ⟨ 2.0000000000000004 1 ¯3.0000000000000004 ⟩ ⟨ ¯2.9999999999999996 1.9999999999999998 1.9999999999999998 ⟩ ⟨ 2 2 2 ⟩ ⟨ 0.532088886237956 ¯0.6527036446661387 ¯2.879385241571817 ⟩ ⟩

QR decomposition

I put some effort golfing this QR decomposition implementation, and I got a very satisfying 98 chars one-liner. Ungolfed a bit, it looks like this:

QR ← +˝∘×⎉1‿∞{
  1=⊢´≢𝕩 ? 𝕩⊸÷⟜⊑⊸⋈+˝⌾(ט)𝕩;
  ∾˘{(q𝔽𝕨)⋈(r𝔽t)∾0𝔽⍟k𝕩}´𝕊𝔽{𝕘-𝕩𝔽t↩𝕩⍉⊸𝔽𝕘}(k↓˘𝕩)⊑q‿r←𝕊𝕩↑˘˜k←⌈2÷˜⊢´≢𝕩⊣t←@
}

The function works like this: it recursively computes the QR decomposition of a matrix by first handling the base case (normalizing a single column) then splitting the matrix into two halves. The first half is decomposed into \(Q_0\) and \(R_0\), and the second half is orthogonalized against \(Q_0\) by subtracting its projection, yielding a residual matrix that is itself decomposed into \(Q_1\) and \(R_1\). Finally, the overall orthogonal matrix \(Q\) is formed by horizontally concatenating \(Q_0\) and \(Q_1\), and the upper triangular \(R\) is assembled as a block matrix combining \(R_0\), the projection coefficients, and \(R_1\):

\begin{equation*} Q \, R = \begin{pmatrix} Q_0 & Q_1 \end{pmatrix} \begin{pmatrix} R_0 & T \\ 0 & R_1 \end{pmatrix} = Q_0 R_0 + Q_0 T + Q_1 R_1, \end{equation*}

We can test it with random matrices:

(⊢∾⟜<m-+˝∘×⎉1‿∞´) QR m ← 3‿3•rand.Range 0
┌─                                                                                                                                             
· ┌─                                                               ┌─                                                               ┌─         
  ╵ 0.7144260899509619  ¯0.11998405924866697 ¯0.6893469282760241   ╵ 1.3272602613914946 0.45821006669986125  0.6778483350121149     ╵ 0 0 0    
    0.24775066978035085  0.9647413811697029   0.0888463453506213     0                  0.697172106504235   ¯0.13992817528755036      0 0 0    
    0.6543813625255287  ¯0.23426031030668987  0.7189625438047855     0                  0                    0.029863470562463006     0 0 0    
                                                                 ┘                                                                ┘         ┘  
                                                                                                                                              ┘

Fast Fourier Transform

Below is a radix-2 implementation of the Cooley–Tukey FFT algorithm in BQN. The inverse is defined with a header, using the standard conjugation identity:

\begin{equation*} \text{iFFT}[\mathbf{x}] = \frac{1}{N}\text{FFT}^{*}[\mathbf{x}^{*}] \end{equation*}

Currently CBQN's develop branch does not yet have native complex numbers, so the code uses a small namespace of explicit complex-number utilities. The important piece here is z.E, which builds complex exponentials, and z._p, which performs the complex multiplication needed for the twiddle factors.

z ← {
  _p ⇐ {(-´𝔽¨)⋈(+´𝔽¨)⟜⌽}
  C‿E ⇐ ⟨⋈⟜-´˘, •math{𝕗.Cos≍˘𝕗.Sin}⟩
}
FFT ← {𝕊⁼: z.C{≠÷˜·𝔽𝔾∘𝔽}𝕊𝕩; (+∾-)⟜(⊢×z._p˘·z.E∘-π×↕⊸÷∘≠)´∘(𝕊¨⊢⊔˜2|⊒˜)⍟(1<≠)𝕩}

As a first sanity check, we can measure the round-trip error after applying the transform and its inverse. This is the absolute \(L^1\)-error over the flattened array, not a normalized relative error:

_rte ← {+´∘|∘⥊⊢-𝔽⁼∘𝔽}
FFT _rte 0•rand.Range˜2⋈˜2⋆10
2.3505503099485736e¯13

There is, however, experimental support for complex numbers as of May 2026. I put together a Nix flake for it, so it can be tested without spelunking through branches and build flags. With native complex numbers, the previous implementation collapses to something much closer to the mathematical shape of the algorithm:

FFT ← {𝕊⁼: 𝕊⌾+⊸÷⟜≠𝕩; (+∾-)⟜(⊢×·⋆∘⍳¯π×⊒˜÷≠)´∘(𝕊¨⊢⊔˜2|⊒˜)⍟(1<≠)𝕩}

The round trip error is very similar to the one with explicit complex number utilities:

RC ← •rand.Range⟜0{𝔽⍳𝔽}
FFT _rte RC 2⋆10
1.8350564257552384e¯13

Native complex numbers also make it pleasant to write the direct \(O(N^2)\) Discrete Fourier Transform. This is not meant to be fast. It is the reference anvil:

DFT ← ⊢+˝∘×⎉1‿∞˜·⋆∘⍳¯2×π×≠÷˜·×⌜˜↕∘≠

The FFT agrees with the direct DFT up to the expected floating-point noise:

_err_ ← {+´∘|∘⥊𝔽-𝔾}
FFT _err_ DFT RC 2⋆10
1.997053680853405e¯9

The recursive FFT above is pretty, but it is not gentle on memory. Each level performs splits, recursive calls, joins, and reshaping gymnastics. A more operational formulation is the iterative radix-2 FFT: first apply the bit-reversal permutation, then perform the butterfly stages in-place in the usual order. In BQN this still allocates, of course, but it removes the recursive tree of joins and makes the stage structure explicit:

FFT2 ← {
  (𝕩⊏˜⥊+⌜´0∾¨p){
    t ← ⋆-⍳π×𝕨÷˜↕𝕨
    ⥊(+∾-)⟜(t⊸×)˝⌾⍉∘‿2‿𝕨⥊𝕩
  }´⌽p ← ↕⌾(2⊸⋆⁼)≠𝕩 
} 

Numerically, it matches the recursive implementation. Performance-wise, it does not deliver:

!(1e¯15>FFT _err_ FFT2) v ← RC 2⋆20
5 (FFT•_timed⋈FFT2•_timed) v
⟨ 4.2211076136 4.1039009052 ⟩

The performance still tanks at scale because CBQN spends most of the time chasing memory allocations and thrashing the cache. To get anywhere near competitive speeds, we have to make the geometry sympathetic to the hardware. We can do this by using the Cooley-Tukey block layout as a macro-scale partitioner, shredding the massive array into chunks that fit entirely inside the L1 data cache. Once the data is isolated there, we feed those chunks into a rigid, transpose-free Pease4 micro-engine. To avoid the brute-force precomputation of a massive global table of constants for every possible block size, we can hide a •HashMap inside the closure.

FFT3 ← {
  hm ← •HashMap˜⟨⟩
  Pease ← {
    hm.Has≠𝕩 ? kernel ← hm.Get≠𝕩 ⋄ Kernel𝕩;
    t ← ⌽⊸{𝕨/¨⋆⍳-π×↕⊸÷¨𝕩}k ← ↕⌾(2⊸⋆⁼)≠𝕩
    s‿r ← ⟨⥊⍉2‿∘⥊↕≠𝕩, ⥊+⌜´0∾¨k⟩ 
    Kernel ← {r⊏𝕩{s⊏(+∾𝕨×-)˝2‿∘⥊𝕩}´t}
    kernel hm.Set˜≠𝕩 ⋄ Kernel𝕩
  }
  _block ← {
    𝕗≥≠𝕩 ? Pease𝕩;
    c‿r ← (⌊⋈⌈)∘÷⟜2⌾(2⊸⋆⁼)n ← ≠𝕩
    t ← ⋆-⍳2×π×n÷˜c×⌜○↕r
    ⥊𝕊˘⌾⍉t×𝕊˘⍉r‿c⥊𝕩
  }
  1024 _block 𝕩
}

Not exactly a triumph of performance engineering, but it will do for now:

!(1e¯6>FFT _err_ FFT3) v ← RC 2⋆20
5 (FFT•_timed⋈FFT3•_timed) v
⟨ 4.1142793994 0.975797937 ⟩

Tensor n-mode product

The n-mode product is a key ingredient for computing the Tucker decomposition of a tensor. For this we can use the HOSVD algorithm: a method that has been rediscovered several times. For example, in the nuclear quantum dynamics community it is known as POTFIT and was published before the often cited De Lathauwer paper, see arXiv:1309.5060 for a discussion. For a tensor \(\mathcal{X}\) and a matrix \(U\) we define:

\begin{equation*} (\mathcal{X} \times_n U)_{i_1,\dots,i_{n-1},\,j,\,i_{n+1},\dots,i_N} = \sum_{i_n=1}^{I_n} x_{i_1,\dots,i_n,\dots,i_N}\, u_{j,i_n}. \end{equation*}

In BQN's parlance, we can express it as:

{+˝∘×⎉1‿∞⟜𝕩⌾(⍉⍟𝕗)𝕨}

A beautiful example of notation as a tool of thought, in my opinion: this deferred 1-modifier (itself a compact melange of six modifiers) computes the 𝕗-mode product of a tensor 𝕨 and a matrix 𝕩. It works by moving the 𝕗-axis to the front, then multiplying 𝕨 and 𝕩 without the need for explicit unfolding, courtesy of the rank operator, and moving the last axis of the result back to 𝕗, all gracefully managed by under.

k-means clustering

A classic algorithm in machine learning for unsupervised clustering. I was once asked to recognize it in an interview, but I didn’t know it at the time. Pity, its implementation is deceptively simple:

KMC ← {
  _fp ← {𝕊∘⊢⍟≢⟜𝔽𝕩}
  c ← 𝕨(•rand.Deal⟜≠⊏⊢)𝕩
  𝕩⊸(>·(+˝÷≠)¨⊣⊔˜·⊏∘⍋˘(+´×˜)∘-⎉1⎉1‿∞)_fp c
}

Here I have chosen to implement the algorithm as a fixed point, but I am not filtering empty groups. You can fix that by adding a •_while_ instead, with an appropriate stopping condition. To obtain the labels, simply record the result of ⊏∘⍋˘ (or ⊐⟜(⌊´), which is faster). Bellow is a test with some sample data:

test ← [
  0‿0, 0‿1, 1‿0
  5‿0, 5‿1, 6‿0
  0‿5, 1‿5, 0‿6
  5‿5, 5‿6, 6‿5
]
3 KMC test
┌─                                       
╵ 5.333333333333333  2.8333333333333335  
  0.3333333333333333 0.3333333333333333  
  0.3333333333333333 5.333333333333333   
                                        ┘

Keep in mind that Lloyd’s algorithm can converge to local minima, and that apparently distinct solutions may in fact be equivalent up to permutation or geometric symmetry (rotations, reflections, etc.).

Update: check out the BQNcrate version that was added after some discussion in the Matrix room.

Solving linear systems

Gaussian elimination is one of the first techniques one learns in any numerical analysis course. The implementation below is the classical \(O(\frac{2}{3}N^3)\) with partial pivoting. It relies on outer products to eliminate columns below the pivot in a more vectorized fashion:

GE ← {
  m‿v ← ¯1(↓˘⋈·⥊↑˘)𝕩{
    p ← 𝕨+⊑⊐⟜(⌈´)|𝕨↓𝕨⊏˘𝕩
    𝕨⊸⊏{(⊢-·0¨⌾⊏𝔽˘×⌜⊏)÷⟜𝔽⌾⊏}⌾(𝕨⊸↓)⍟(0≠𝕨‿𝕨⊸⊑)⌽⌾(𝕨‿p⊸⊏)𝕩
  }´⌽↕⌊´𝕩≢⊸-↕2
  (0¨{(v𝕨⊸⊑⊸-⊑𝕩+˝∘×⎉1‿∞○(𝕨⊸↓)𝕨⊏m)⌾(𝕨⊸⊑)𝕩}´⊒˜)v
}

The second pass (back substitution) is a purely functional fold that resolves the variable vector v by threading the known solutions upwards against the upper-triangular matrix m. A bit janky for my taste. The implementation of Gauss-Jordan, which is \(O(N^3)\), does deliver the elegance I seek:

GJ ← {
  ⊢´˘𝕩{
    p ← 𝕨+⊑⊐⟜(⌈´)|𝕨↓𝕨⊏˘𝕩
    𝕨⊸⊏{(⊢-·0¨⌾𝔽𝔽˘×⌜𝔽)÷⟜𝔽⌾𝔽}⍟(0≠𝕨‿𝕨⊸⊑)⌽⌾(𝕨‿p⊸⊏)𝕩
  }´⌽↕⌊´𝕩≢⊸-↕2
}

We can confirm the correctness of both implementations by solving a random system and checking the resulting residual. If they are equal up to a tolerance, we compare the CPU times. Regardless of the theoretical time complexity, Gauss-Jordan is faster in BQN due to its vectorized nature:

{1e¯10>|+´(GE-GJ)𝕩}◶@‿(GE•_timed÷GJ•_timed) (⊢•rand.Range˜⊢⋈+⟜1) 1e3
19.52462177736598

Chebyshev coefficients of the first kind

There is a compact expression already in the BQNcrate, which efficiently implements the standard recurrence:

\begin{equation*} T_n(x) = 2x T_{n-1}(x) - T_{n-2}(x) \end{equation*}

I wanted to explore an alternative approach that I haven't seen so often. Instead of viewing \(T_n\) as the child of \(T_{n-1}\), we view it as the solution to an eigenvalue problem. The polynomial \(T_n(x)\) is the eigenfunction of the Chebyshev Differential Equation with eigenvalue \(n^2\):

\begin{equation*} (1-x^2)y'' - xy' + n^2y = 0 \end{equation*}

Using the Frobenius method, we assume a solution of the form \(y = \sum a_k x^k\). Plugging this into the ODE gives us a direct relationship between coefficients of power \(k\) and \(k+2\):

\begin{equation*} a_{k+2} (k+2)(k+1) + a_k (n^2 - k^2) = 0 \end{equation*}

Now we only need an suitable initial condition. Fortunately, for all Chebyshev polynomials (odd or even), the leading coefficient is determined by the normalization \(T_n(x) \approx (2x)^n \dots\), so \(a_n = 2^{n-1}\). By starting at the singularity (\(k=n\)) and flowing down to \(k=0\), we avoid all parity checks and singularities:

T ← {
  v ← ×`(1⌈2⋆𝕩-1)⌾⊑÷˝⟨1,𝕩,2,-𝕩⟩+⌜p ← 𝕩-2×↕1+⌊𝕩÷2
  v⌾(p⊸⊏)0⥊˜𝕩+1
}

Using some beautiful array tricks, like folding division over a list of functions, which nicely simplifies fractions of binomials due to the "inverted" operation order of ÷; and creating an alternate boolean mask by scanning , which eliminates the need for the clunky under, Marshall transformed the previous function into this (little-endian):

T ← {⌽(⊢××`∘⋆)⟜(≠`1¨)(1⌈2⋆𝕩-1)∾÷˝⟨1,𝕩,2,-𝕩⟩+⌜⌽↕𝕩}

And here is the typical table:

{
  A ← "x"∾"⁰¹²³⁴⁵⁶⁷⁸⁹"⊏˜'0'-˜•Fmt
  B ← ⊢/˜·¬≠↑"1x"⍷⊢
  C ← 1↓·∾∘⌽0⊸≠/×⊸⊏⟜" +-"∾¨•Fmt∘|¨∾¨·A¨⊒˜
  (>⊢↑¨˜·⌈´≠¨)(B C∘T↓˜¯2+2⊸|)¨↕𝕩
} 11
┌─                                   
╵"1                                  
  x                                  
  2x²-1                              
  4x³-3x                             
  8x⁴-8x²+1                          
  16x⁵-20x³+5x                       
  32x⁶-48x⁴+18x²-1                   
  64x⁷-112x⁵+56x³-7x                 
  128x⁸-256x⁶+160x⁴-32x²+1           
  256x⁹-576x⁷+432x⁵-120x³+9x         
  512x¹⁰-1280x⁸+1120x⁶-400x⁴+50x²-1" 
                                    ┘

Array programming

Here are some tricks and I ideas I find useful in array programming, many of them born from some REPL exploration.

An identity on the naturals

Some time ago, while working on performance optimization of linear algebra operations with Boolean arrays, I encountered an interesting summation property for an array \(a\) of length \(n\):

\begin{equation*} \sum_{i | a_i \neq 0} \sum_{j=i+1} f_j = \sum_{j=0} f_j \sum_{i < j | a_i \neq 0} 1 \end{equation*}

It turns out that the RHS can be elegantly transformed into a scan, giving rise to a beautiful identity that applies to all natural numbers, not just Booleans as I initially thought:

(+`≡·+´/≤⟜<⊒˜) •rand.Range˜ 1e3
1

This identity holds because ⊒˜ represents the indices i of the list, and since +´(/𝕩)=i ←→ i⊑𝕩, the fold sums all the elements in 𝕩 up to i, for i in the range of the length of the list. Ergo, a scan.

Depth of nested lists

Studying tree algorithms in APL, I learned about the depth vector representation. If the nested object in consideration is a string, the best approach is using boolean masks. However, when dealing with a BQN list, recursion becomes necessary to determine the depth of nested elements. Here’s how it can be implemented:

{=◶⟨⋈0, 1+·∾𝕊¨⟩𝕩} ⟨1, ⟨2, ⟨3⟩, ⟨4, ⟨5, ⟨6, 7⟩⟩⟩⟩, 1⟩
⟨ 1 2 3 3 4 5 5 1 ⟩

Another fun spiral

There are many variants of the spiral matrix problem. One fun version appears in the CSES problem set, which I attempted to generate using array programming techniques. In particular, I implemented a variation of Joey Tuttle’s remarkable method for constructing integer volutes. For background, see this post, which also links to the original APL and J sources.

Z1 ← {∘‿𝕩⥊⍋+`(⥊1∾˘≍˘˜↕𝕩)(⊣/·-⌾(2⊏˘∘‿3⊸⥊)≠⊸⥊)𝕩‿1}

The key to Tuttle’s method is understanding the role of grade up as a permutation inverter. Rather than constructing the spiral directly, the algorithm first generates a value → position map. This list is created by a scan that integrates a sequence of delta moves capturing the spiral's path P, that is, a simulation of the process with ⟨1, 𝕩, ¯1, -𝕩⟩ corresponding to steps right, down, left, up. The resulting path vector is a map where k⊑P is the final flattened position for the value k. The grade up ⍋P then inverts this relationship, transforming it into a position → value map: the final flattened spiral.

Interestingly, this particular problem also has a concise closed-form solution, described beautifully in this other post, made zero based with the observation that 𝕨⌈○(1⊸+)𝕩 is 1+𝕨⌈𝕩:

Z2 ← (ט⊸+∘⌈--ׯ1⋆⌈)⌜˜↕

Let’s confirm that both methods produce identical results:

(Z1≡Z2)◶@‿Z1 ⊑1 •rand.Deal 10
┌─                
╵  0  1  8  9 24  
   3  2  7 10 23  
   4  5  6 11 22  
  15 14 13 12 21  
  16 17 18 19 20  
                 ┘

Array style breath-first search

If your problem is small enough to fit comfortably within a dense adjacency matrix, you won't find a more elegant BFS algorithm than the one presented in this excellent GraphBLAS talk.

BFS ← {1+´¬∨`𝕩∨˝∘∧⍟n𝕨=n←↕≠𝕩}

Foundations of computation

For a lack of a better header. Here I group some ideas which are more in the direction of theoretical computer science and algorithm design, rather than practical implementations.

Hillis-Steele scan

I was asked to identify this clever algorithm in a CUDA interview, but tripped up, mistaking its logarithmic stride for a radix-2 FFT butterfly pattern. While it sacrifices work efficiency for span, as it performs \(O(N log N)\) operations rather than \(O(N)\), the underlying concept is mathematically beautiful.

Staring at the imperative CUDA implementation, with its explicit thread barriers and pointer arithmetic, it is hard to intuitively "feel" the algorithm. In contrast, BQN strips away the hardware mechanics to reveal its crystal-clear algebraic essence:

HS ← ⊢{𝕩+»⍟𝕨𝕩}´↕∘⌈⌾(2⊸⋆⁼)∘≠

An alternative array approach here is to precompute zero-padding blocks to drive a dyadic shift, (⊢+»)´0⥊˜¨, but materializing those intermediate creates a memory bandwidth pressure for large inputs. Ultimately, the true parallel power of Hillis-Steele lives entirely inside the vector sum and spatial shift; the fold itself acts merely as the temporal synchronization barrier. Implementing this in BQN is practically pointless for production though, as the language already boasts a heavily optimized, SIMD-accelerated implementation of the scan primitive:

(HS•_timed÷+`•_timed)⍟(+`≡HS) •rand.Deal 1e7
76.56145702110905

So yeah, don't do this at home.

The Z combinator

In an eagerly-evaluated language, the legendary Y combinator will eagerly blow your stack. To find the fixed point of a higher-order generator function without infinite evaluation loops, we need the delayed η-expansion of Z. Mathematicians are notoriously good at overloading terminology, and thanks to it I mistakenly conflated this functional fixed point with the fixed-point iteration, which is different and elegantly achieved in BQN with this 1-modifier: {𝕊⍟≢⟜𝔽𝕩}.

Given BQN's native self-reference, there is little practical reason to implement Z from scratch. The motivation is purely the intellectual joy of golfing something structurally cleaner than the heavily parenthesized tangles found in other languages. Here is my attempt:

Z ← {𝕏∘{𝕩{(𝔽𝕗)∘⊢𝕩}}{𝔽𝕗}}

By leveraging BQN's syntactic roles and closures, the necessary structural delay falls naturally into place. Now we just need a cliché recursive algorithm to test our engine. The Collatz conjecture stopping time will do nicely:

C ← {𝕩{1=𝕩 ? 0; 1+𝔽2⊸|◶⟨÷⟜2, 1+×⟜3⟩𝕩}}

And we defer to OEIS A006577 for our empirical validation:

(Z c)○⊢¨ 1+↕10
⟨ 0 1 7 2 5 8 16 3 19 6 ⟩

Footnotes:

1

Almost Perfect Artifacts Improve only in Small Ways: APL is more French than English, Alan J. Perlis (1978). From jsoftware's papers collection.

2

Initially, I intended to rigorously attribute all contributions, but this quickly filled the text with footnotes. I often get help streamlining my solutions from Marshall Lochbaum (the BQN creator), dzaima (the CBQN developer), and other fine folks from the BQN matrix room, thank you all! Please check the logs for more context.

3

Don’t believe me? Just ask Kilgore Trout!

4

I was very pleased to come across Pease's FFT, an old idea worth rediscovering. For an in-depth discussion of FFT performance see, arXiv:2602.23525.