#!/usr/bin/python3 """Variants of Russ Cox’s bisect algorithm to find a minimal satisfying subseq (Invoke from the command line with or without ``-easy`` and ``-long`` to demo different variants.) presents an algorithm previously presented in for finding a locally minimal satisfying subsequence Σ of a sequence Π satisfying some predicate that has the property that if a sequence Φ is satisfying, all the longer subsequences of the original sequence Π of which Φ is a subsequence are also satisfying. “Subsequence” here is “subsequence” in the LCS sense, not “substring”: “ktb” is a subsequence of “kitab”, for example, but not a substring. This is useful for automatic debugging because “My program crashes on input Σ” is very often such a predicate, and simple binary search for a single substring of the input is often insufficient, because the necessary input to crash the program is not always a contiguous chunk of the input stream. But it is not limited to that application. This does *not* implement the hash-based bisection method that is the main focus of Cox’s post. It also doesn’t implement the “easy/hard” algorithm. In fact, before I implemented the “easy” algorithm, I implemented how I misremembered the “easy” algorithm, which turns out to be an arguably simpler algorithm that also works. That’s ``bisect_dumb`` below. Here’s the output on a test from Zeller’s _Why Programs Fail_, which this algorithm solves with 21 probes instead of Zeller’s 48:: ☑ ' at 39 ☑ '' Note that, just like Zeller’s `ddmin` algorithm, both of these return a locally minimal sequence, having previously tried each of the possible individual-item removals. """ import re import sys def first_satisfying_bisect(p, start, end): """Return the first i in [start, end) such that p(i), if any. This is in a sense the most basic version of the binary search algorithm. - p: predicate to satisfy. - start: first integer in the candidate range. - end: first integer > start not in the candidate range. Subject to the restriction that if p(j) and k > j, p(k) too. That is, once p goes from false to true, it never goes back to false. Possibly p(i) is false for all values, in which case this algorithm returns end. But it will never call p(end). """ while start < end: # Invariant: the correct return value is in [start, end]. # Variant: end - start strictly diminishes each iteration. mid = start + (end - start) // 2 if p(mid): end = mid # mid is a valid return value; nothing later can be else: start = mid + 1 # mid, and everything earlier, is *not* valid return start # Maybe a Lisp programmer or golfer would write: # bs = lambda p,s,e:(lambda m:bs(p,s,m)if # p(m)else bs(p,m+1,e))((s+e)//2)if s example from Zeller. See Zeller’s _Why Programs Fail_, second edition, §5.5, example 5.8, p. 116 (124/388). Mozilla was crashing upon the second attempt to print the Bugzilla home page, and the problem turned out to be that it contained a ' def instrument(p, formatter=lambda xs: repr(''.join(xs))): def instrumented(xs): result = p(xs) checkbox = '☑' if result else '☒' print(checkbox, formatter(xs)) return result return instrumented if __name__ == '__main__': s = example_string if '-long' in sys.argv: s = ' ' * 10000 + s bisect = (trisect_easy if '-tri' else bisect_easy if '-easy' in sys.argv else bisect_dumb) result = bisect(instrument(example_predicate), list(s), log=print) print(result) print(repr(''.join(s[i] for i in result)))