\ sample code from http://www.yosefk.com/blog/my-history-with-forth-stack-machines.html : mean_std ( sum2 sum inv_len — mean std ) \ precise_mean = sum * inv_len; tuck u* \ sum2 inv_len precise_mean \ mean = precise_mean >> FRAC; dup FRAC rshift -rot3 \ mean sum2 inv_len precise_mean \ var = (((unsigned long long)sum2 * inv_len) >> FRAC) - (precise_mean * precise_mean >> (FRAC*2)); dup um* nip FRAC 2 * 32 - rshift -rot \ mean precise_mean^2 sum2 inv_len um* 32 FRAC - lshift swap FRAC rshift or \ mean precise_mean^2 sum*inv_len swap - isqrt \ mean std ; ( We have some fixed-point math here. Let’s admit that, and define the fixed-point math primitives. It looks like one of sum and inv_len is a fixed-point fraction, while the other is an ordinary integer; I’m guessing that sum2 is the sum of the squares, and inv_len is the reciprocal of the length, which therefore must be a fraction, while sum and sum2 are ordinary integers. Adopting "." as an indicator character for fixed-point arithmetic, even though that conflicts with its usual Forth meaning of “output”: ) : s.* u* ; : .>s FRAC rshift ; : .* um* 32 FRAC - lshift swap FRAC rshift or ; \ Now this should be quite straightforward. variable sumsq variable sum variable len_recip variable .mean : variance sumsq @ len_recip @ s.* .mean @ dup .* - .>s ; : mean_std len_recip ! sum ! sumsq ! sum @ len_recip @ s.* .mean ! .mean @ .>s variance isqrt ; ( I believe that the only performance differences between this and the original version should be: 1. It doesn't use an integer multiply to multiply FRAC by 2 at run-time! 2. There are some functions that would benefit from being inlined. This is somewhat clumsy but doable even if the compiler doesn't do optimization; you just end up with definitions like : s.* postpone u* ; immediate and the like. It’s probably also worthwhile to change `32 FRAC - lshift` to `[ 32 FRAC - ] literal lshift` for a similar reason. 3. It uses memory operations instead of stack operations. The original uses eight stack operations; this version uses ten memory operations [plus a stack operation]. Reducing that a bit by using the stack and/or the return stack would be pretty straightforward. Still, I think that after sufficient optimization, you’ll always end up with something as ugly and incomprehensible as the original version. 4. It might be wrong, since I haven’t tested it, and I assume Yossi’s original code was copy-pasted from working code. )