The Bailey–Borwein–Plouffe formula is one of the several algorithms to compute \(\pi\). Here it is:

\[\pi = \sum_{k = 0}^{\infty}\left[ \frac{1}{16^k} \left( \frac{4}{8k + 1} - \frac{2}{8k + 4} - \frac{1}{8k + 5} - \frac{1}{8k + 6} \right) \right]\]

What makes this formula stand out among other approximations of \(\pi\) is that it allows one to directly extract the \(n\)-th fractional digit of the hexadecimal value of \(\pi\) without computing the preceding ones.

Digits of pi

Image credit: Cormullion, Julia code here.

The Wikipedia article about the Bailey–Borwein–Plouffe formula explains that the \(n\)-th fractional digit (well, actually it’s the \(n+1\)-th) \(d_{n}\) is given by

\[d_{n} = 16 \left[ 4 \Sigma(n, 1) - 2 \Sigma(n, 4) - \Sigma(n, 5) - \Sigma(n, 6) \right]\]

where

\[\Sigma(n, j) = \sum_{k = 0}^{n} \frac{16^{n-k} \bmod (8k+j)}{8k+j} + \sum_{k = n+1}^{\infty} \frac{16^{n-k}}{8k+j}\]

Only the fractional part of expression in square brackets on the right side of \(d_{n}\) is relevant, thus, in order to avoid rounding errors, when we compute each term of the finite sum above we can take only the fractional part. This allows us to always use ordinary double precision floating-point arithmetic, without resorting to arbitrary-precision numbers. In addition note that the terms of the infinite sum get quickly very small, so we can stop the summation when they become negligible.

Implementation in Julia language

Here is a Julia implementation of the algorithm to extract the \(n\)-th fractional digit of \(\pi\):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Return the fractional part of x, modulo 1, always positive
fpart(x) = mod(x, one(x))

function Σ(n, j)
    # Compute the finite sum
    s = 0.0
    denom = j
    for k in 0:n
        s = fpart(s + powermod(16, n - k, denom) / denom)
        denom += 8
    end
    # Compute the infinite sum
    num = 1 / 16
	while (frac = num / denom) > eps(s)
        s     += frac
        num   /= 16
        denom += 8
    end
    return fpart(s)
end

pi_digit(n) =
    floor(Int, 16 * fpart(4Σ(n-1, 1) - 2Σ(n-1, 4) - Σ(n-1, 5) - Σ(n-1, 6)))

pi_string(n) = "0x3." * join(string.(pi_digit.(1:n), base = 16)) * "p0"

The pi_digit function gives the \(n\)-th hexadecimal fractional digit of \(\pi\) as a base-10 integer, and the pi_string function returns the first \(n\) hexadecimal digits of \(\pi\) as a valid hexadecimal floating-point literal:

julia> pi_digit(1)
2

julia> pi_digit(6)
10

julia> pi_string(1000)
"0x3.243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89452821e638d01377be5466cf34e90c6cc0ac29b7c97c50dd3f84d5b5b54709179216d5d98979fb1bd1310ba698dfb5ac2ffd72dbd01adfb7b8e1afed6a267e96ba7c9045f12c7f9924a19947b3916cf70801f2e2858efc16636920d871574e69a458fea3f4933d7e0d95748f728eb658718bcd5882154aee7b54a41dc25a59b59c30d5392af26013c5d1b023286085f0ca417918b8db38ef8e79dcb0603a180e6c9e0e8bb01e8a3ed71577c1bd314b2778af2fda55605c60e65525f3aa55ab945748986263e8144055ca396a2aab10b6b4cc5c341141e8cea15486af7c72e993b3ee1411636fbc2a2ba9c55d741831f6ce5c3e169b87931eafd6ba336c24cf5c7a325381289586773b8f48986b4bb9afc4bfe81b6628219361d809ccfb21a991487cac605dec8032ef845d5de98575b1dc262302eb651b8823893e81d396acc50f6d6ff383f442392e0b4482a484200469c8f04a9e1f9b5e21c66842f6e96c9a670c9c61abd388f06a51a0d2d8542f68960fa728ab5133a36eef0b6c137a3be4ba3bf0507efb2a98a1f1651d39af017666ca593e82430e888cee8619456f9fb47d84a5c33b8b5ebee06f75d885c12073401a449f56c16aa64ed3aa62363f77061bfedf72429b023d37d0d724d00a1248db0fead3p0"

While I was preparing this post I found an unregistered package PiBBP.jl that implements the Bailey–Borwein–Plouffe formula. This is faster than my code above, mostly thanks to a function for modular exponentiation more efficient than that available in Julia standard library.

Test results

Let’s check if the function is working correctly. We can use the parse function to convert the string to a decimal floating point number. IEEE 754 double precision floating-point numbers have a 53-bit mantissa, amounting to \(53 / \log_{2}(16) \approx 13\) hexadecimal digits:

julia> pi_string(13)
"0x3.243f6a8885a30p0"

julia> parse(Float64, pi_string(13))
3.141592653589793

julia> Float64(π) == parse(Float64, pi_string(13))
true

Generator expressions allow us to obtain the decimal value of the number in a very simple way, without using the hexadecimal string:

julia> 3 + sum(pi_digit(n)/16^n for n in 1:13)
3.141592653589793

We can use the arbitrary-precision BigFloat to check the correctness of the result for even more digits. By default, BigFloat numbers in Julia have a 256-bit mantissa:

julia> precision(BigFloat)
256

The result is correct for the first \(256 / \log_{2}(16) = 64\) hexadecimal digits:

julia> pi_string(64)
"0x3.243f6a8885a308d313198a2e03707344a4093822299f31d0082efa98ec4e6c89p0"

julia> BigFloat(π) == parse(BigFloat, pi_string(64))
true

julia> 3 + sum(pi_digit(n)/big(16)^n for n in 1:64)
3.141592653589793238462643383279502884197169399375105820974944592307816406286198

It’s possible to increase the precision of BigFloat numbers, to further test the accuracy of the Bailey–Borwein–Plouffe formula:

julia> setprecision(BigFloat, 4000) do
           BigFloat(π) == parse(BigFloat, pi_string(1000))
       end
true

Multi-threading

Since the Bailey–Borwein–Plouffe formula extracts the \(n\)-th digit of \(\pi\) without computing the other ones, we can write a multi-threaded version of pi_string, taking advantage of native support for multi-threading in Julia. A first naive implementation can use the Threads.@threads:

1
2
3
4
5
6
7
function pi_string_threads(N)
    digits = Vector{Int}(undef, N)
    Threads.@threads for n in eachindex(digits)
        digits[n] = pi_digit(n)
    end
    return "0x3." * join(string.(digits, base = 16)) * "p0"
end

Running Julia with 4 threads gives a about 2× speed-up:

julia> Threads.nthreads()
4

julia> using BenchmarkTools

julia> pi_string(1000) == pi_string_threads(1000)
true

julia> @benchmark pi_string(1000)
BenchmarkTools.Trial:
  memory estimate:  105.17 KiB
  allocs estimate:  2013
  --------------
  minimum time:     461.400 ms (0.00% GC)
  median time:      463.401 ms (0.00% GC)
  mean time:        464.136 ms (0.00% GC)
  maximum time:     471.332 ms (0.00% GC)
  --------------
  samples:          11
  evals/sample:     1

julia> @benchmark pi_string_threads(1000)
BenchmarkTools.Trial:
  memory estimate:  115.06 KiB
  allocs estimate:  2038
  --------------
  minimum time:     215.591 ms (0.00% GC)
  median time:      218.030 ms (0.00% GC)
  mean time:        219.685 ms (0.00% GC)
  maximum time:     237.657 ms (0.00% GC)
  --------------
  samples:          23
  evals/sample:     1

Threads.@threads uses a static scheduler which evenly distributes the workload across all threads. This is very useful when all cycles take about the same time, but the computational cost of pi_digit is \(O(n \log(n))\), so the larger the value of \(n\), the longer the function will take. A better approach can be to use the task-based parallelism introduced in Julia v1.3, and in particular the Threads.@spawn macro:

1
2
3
4
5
function pi_string_tasks(N)
    tasks = [Threads.@spawn pi_digit(n) for n in 1:N]
    digits = [fetch(t) for t in tasks]
    return "0x3." * join(string.(digits, base = 16)) * "p0"
end

Let’s see its performance when using 4 threads:

julia> pi_string(1000) == pi_string_tasks(1000)
true

julia> @benchmark pi_string_tasks(1000)
BenchmarkTools.Trial:
  memory estimate:  574.30 KiB
  allocs estimate:  8020
  --------------
  minimum time:     115.993 ms (0.00% GC)
  median time:      116.872 ms (0.00% GC)
  mean time:        117.798 ms (0.00% GC)
  maximum time:     129.769 ms (0.00% GC)
  --------------
  samples:          43
  evals/sample:     1

This shows a perfect 4× speed-up.

If you want to know more about data parallelism models in Julia I recommend