From kragen@dnaco.net Fri Sep 11 17:02:08 1998
Date: Fri, 11 Sep 1998 17:02:07 -0400 (EDT)
From: Kragen <kragen@dnaco.net>
To: "Bradley M. Kuhn" <bkuhn@ebb.org>
cc: clug-user@clug.org
Subject: Re: life depending on code (was Re: Story about a literary culture of Unix)
In-Reply-To: <19980911162852.D23540@ebb.org>
Message-ID: <Pine.SUN.3.96.980911164101.16247q-100000@picard.dnaco.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Keywords:
X-UID: 1867
Status: O
X-Status: 

On Fri, 11 Sep 1998, Bradley M. Kuhn wrote:
> Thus spoke Kragen:
> > Well, no, you could quite easily do string processing without any
> > arrays at all (as long as you had structs, or at least conses.)
> > 
> > You would certainly have to do string processing differently.
> 
> And slower.  :)

Not necessarily, no.  C strings are slow to concatenate, slow to
compare, slow to take the length of, slow to copy, slow to insert and
delete from the middle of, but fast to iterate through, fast to append
a character to, fast to alter particular characters in, fast to fetch
particular characters from, and fast to find random characters in.

Other possible implementations could probably be almost as fast to
iterate through, instantaneous to compare, relatively fast to
concatenate, instantaneous to take the length of, instantaneous to
copy, relatively fast to insert and delete from the middle of, almost
as fast to append a character to, almost as fast to alter particular
characters in, almost as fast to fetch particular characters from,
and/or relatively fast to find random characters in.

Counted strings are like C strings, but are faster to take the length
of.  Depending on the implementation, they can be much faster to
extract substrings from (if you don't need to modify the substrings).
These guys would not be usable in a strict pointer-arithmetic-free
environment, but you could hide them behind a curtain.  They also have
the major advantage that you can store null characters in them.

Mutable linked-list strings are like C strings, but are very fast to
insert and delete from, slower to fetch and alter, and much slower to
find random characters in.

Tree strings are a few times slower than C strings, but have fewer
unreasonable slowdowns.  They're fast to concatenate, reasonably fast
to iterate through, slow to compare, slow to take the length of, slow
to copy, relatively fast to insert and delete from the middle of,
relatively fast to append a character to, fast to fetch or alter alter
particular characters in, and relatively fast to find random characters
in.  They're the fastest array-free method when you need to find random
characters a lot.

Atoms are a layer on top of another string representation.  They make
comparison and copying instantaneous, and all mutation operations very
slow, because you have to copy the whole underlying string and then
find the new string in the string table.

Atoms are typically implemented in an array, but they wouldn't have to
be.

There are lots and lots of other variations.  Arrays are only an
optimization and simplification.

Kragen

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The sages do not believe that making no mistakes is a blessing. They believe, 
rather, that the great virtue of man lies in his ability to correct his 
mistakes and continually make a new man of himself.  -- Wang Yang-Ming


