From kragen@dnaco.net Fri Sep 25 00:00:15 1998
Date: Fri, 25 Sep 1998 00:00:13 -0400 (EDT)
From: Kragen <kragen@dnaco.net>
X-Sender: kragen@pike
To: systalk@ml.org, tech@clug.org
Subject: distributed database
Message-ID: <Pine.GSO.3.96.980924234522.16764N-100000@pike>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Keywords:
X-UID: 2147
Status: O
X-Status: 

I'm a distribution freak.  I don't like the idea that *anything* has to
be centralized.

Here's what I'm on about at the moment: search engines are very
centralized.  I'd like to remedy this situation.

Trouble is, I don't know how to build a distributed index.  It's plain
enough how to build an index where all the index for a particular range
of keys lives on one machine or group of machines, and all the index
for another particular range of keys lives on another machine or group
of machines.  DNS does it, and does it well -- it's the most reliable
service on the Internet.  But DNS doesn't have to do joins.

The thing is, if I want to search for pages containing both "Kragen"
and "criminal", it's unlikely that the parts of the index for both of
these will reside on the same machines.  So I want to find a way to
combine the index entries for the 13038 occurrences of "Kragen" and the
1,823,325 occurrences of "criminal" and find the 88 pages in which they
both occur, and discover that none of them are about me.

I don't know of a reasonably-fast way of doing this.  The best method
I've been able to find so far is a method of walking down binary tries
described in Technical Report TR 94-18 from the University of New
Hampshire, which proves that the average execution time of this method
is something I won't understand without reading the whole paper.  It
seems apparent to me, though, that the amount of network traffic
required to find the intersection will likely be rather large.

Kragen

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The sages do not believe that making no mistakes is a blessing. They believe, 
rather, that the great virtue of man lies in his ability to correct his 
mistakes and continually make a new man of himself.  -- Wang Yang-Ming


