From kragen@dnaco.net Thu Sep 24 22:29:48 1998
Date: Thu, 24 Sep 1998 22:29:46 -0400 (EDT)
From: Kragen <kragen@dnaco.net>
X-Sender: kragen@pike
To: "Robert G. Brown" <rgb@phy.duke.edu>
cc: Shachar Tal <shachar@vipe.technion.ac.il>, beowulf@cesdis1.gsfc.nasa.gov, 
    extreme-linux@acl.lanl.gov
Subject: Re: Cluster-wide overclocking...
In-Reply-To: <Pine.LNX.3.96.980924180516.20987F-100000@ganesh.phy.duke.edu>
Message-ID: <Pine.GSO.3.96.980924220734.16764G-100000@pike>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Keywords:
X-UID: 2140
Status: O
X-Status: 

On Thu, 24 Sep 1998, Robert G. Brown wrote:
> There are other cases where it would be a disaster.  If you are trying
> to certify the passwd file as being "unbreakable" for a client in a
> million dollar banking system and some kid running crack turns around
> and finds it on a NON overclocked system in ten seconds,

Well, this is still a bad example, because no reputable security person
will certify a passwd file as unbreakable.  The bad guys always have
better dictionaries than the good guys, because the bad guys
packet-sniffed netcom for several months and they know what kinds of
passwords people *really* choose.

BTW, there was an article somewhere recently about an experimental
machine in which all the components were defective.  The machine was
just massively redundant, and cheaper to build than a non-redundant
machine of equivalent power.  There was a pointer to it on slashdot.org 
in the last week or two, IIRC.

> Really, one can actually make this completely mathematical given any
> expected rate of failure and a reasonable knowledge of the classes of
> failure that can occur,

That's the kicker, though.  Expected rates of failure in computers tend
to be badly unpredictable; classes of failure tend to be difficult to
enumerate.

> Statistics is one where that is VERY DANGEROUS to make
> assumptions based on naive frequency interpretations,

Yeah.  I thought about your Monte Carlo calculations; it seems to me
that one or two errors could easily swamp the whole run, on many kinds
of Monte Carlo calculations.  If you're calculating an approximation of
a value X lots of times and then averaging your approximations to get a
better X, in particular.  Suppose that X is really about a billion.
One single-bit error in a floating-point exponent could result in
estimating an X that's closer to 10^20 or so.  

Your original thought that occasional one-bit errors wouldn't matter
seems like a naive assumption about frequency distributions.  But maybe
your Monte Carlo calculations are of a type that wouldn't be affected
in quite this way.

Kragen

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The sages do not believe that making no mistakes is a blessing. They believe, 
rather, that the great virtue of man lies in his ability to correct his 
mistakes and continually make a new man of himself.  -- Wang Yang-Ming