I was just reading about Community Memory and wondering how little space you could get it to take up. The service allowed you to write messages on an ASR-33, do multi-keyword searches on the existing messages (and then refine them), and then display either brief results (the top line of each message in the result preceded by a number) or full results (the whole message) either for all messages or for a single message. A simple reimplementation of Community Memory as described ---------------------------------------------------------- The first question is how little space you need for the code itself. The actual messaging functionality should be very simple and not take much code. The top-level outline looks like this: /* in communitymemory.c: */ #include #include <> <> <> <> <> <> <> <> <
> The top-level `main()` just passes off control to the command-line interpreter, which never returns; but there’s a return statement to please the compiler. /* in main: */ int main() { cli(); return 0; } The command-line interpreter mostly has the job of handling errors. In order to get a good size estimate on how much code you need for the whole system, we’re not using `printf()` or anything like that; instead we implement everything from scratch, as simply as possible, even at the cost of inefficiency. Except `setjmp`, which you can’t implement in C. Note that we're also using `static` to free the compiler to do more optimization, and `inline` to suggest that it do so. /* in top-level command loop: */ <> <> static inline void cli() { int err; if ((err = setjmp(onfail)) != 0) { show_string("surprise: "); show_error(err); show_char('\n'); } This part above is kind of magical if you don’t know about `setjmp`. Basically the body of this `if` here is where control goes when there’s an error; it doesn’t actually run when `cli()` calls `setjmp()`, because `setjmp()` always returns 0 when it’s actually returning. So then we just have the normal endless command-line loop. for (;;) { show_string("\n>"); input_line(input_buf, sizeof(input_buf)); show_char('\n'); process_command(input_buf); } } `input_line()` is like `fgets()`. I’ll make a detour now into the rest of the error handling. For error handling, I just jump to an error handler up the stack with an integer error code. /* in error handling: */ enum error { TEXT_FULL = 1, MESSAGES_FULL, INVALID_COMMAND, INDEX_OUT_OF_RANGE, STUPID_BUG, }; static jmp_buf onfail; static void fail(enum error why) { longjmp(onfail, why); } So when someone calls `fail(TEXT_FULL)`, control goes back to `cli()` at the point where `setjmp()` is returning, but with the return value `TEXT_FULL` instead of 0. Any functions that were in the middle of executing, being called from `cli()`, simply vanish. It’s like exception handling in C++ or Java. Now, normally you’d have to worry about resources that were acquired and not released by those functions, or data structures that they left in inconsistent states. But this program doesn’t have any resources that can be acquired and then released, and it doesn’t have any fancy data structures, so keeping them consistent in the face of error handling is very simple. The error message strings should probably be kept in an array, but here they are in an if-else cascade instead. I’m keeping these fairly terse with the conceit that the imaginary user may be using an 11-character-per-second ASR-33, but not as terse as `ed`(1), whose `?` universal error message is not welcoming to newcomers. /* in show_error: */ static inline void show_error(int err) { if (err == TEXT_FULL) { show_string("out of space for text"); } else if (err == MESSAGES_FULL) { show_string("can't handle more messages"); } else if (err == INVALID_COMMAND) { show_string("can't understand command"); } else if (err == INDEX_OUT_OF_RANGE) { show_string("there aren't that many search results"); } else if (err == STUPID_BUG) { show_string("programmer is an idiot"); } else { show_string("don't know what went wrong"); } } So, there’s the top level of the command loop, and the error handling. Here’s the fairly ugly command dispatch itself: /* in process_command: */ static inline void process_command(char *line) { if (substreq("find ", line)) { search_for_terms(line + 5, messages, nextmessage - messages); show_result_count(); } else if (substreq("and ", line)) { search_for_terms(line + 4, results, nresults); show_result_count(); } else if (substreq("brief", line)) { brief(); } else if (substreq("print ", line)) { show_result(input_num(line + 6)); } else if (substreq("print", line)) { int ii; for (ii = 1; ii <= nresults; ii++) show_result(ii); } else if (substreq("add", line)) { post_message(); } else { fail(INVALID_COMMAND); } } Those are all of the commands: `find`, `and` (which modifies your search), `brief`, `print`, and `add`. `substreq` tests whether a null-terminated string is the initial substring of `line` — not including the terminating null! The `search_for_terms` routine handles the full-text searching (and does a bit of command-line parsing of its own). The “+ 5”, etc., offsets are a little bit ugly, since they duplicate the contents of the text, but they’re simpler than writing a general command-line parsing library. We’ll get to the other string handling stuff in a bit. First, though, the meat of the user commands: /* in user commands: */ static void show_result_count() { show_num(nresults); show_string(" ITEMS FOUND\n"); } `show_num` outputs a decimal (unsigned) integer, like `printf("%u")`. static char input_buf[80]; static inline void post_message() { show_string("Type your message below. After the last line,\n"); show_string("type a period by itself, like this:\n"); show_string(">.\n"); new_message(); for (;;) { show_string(">"); input_line(input_buf, sizeof(input_buf)); if (input_buf[0] == '.' && input_buf[1] == '\n') break; add_to_message(input_buf); } show_string("Your message has been added.\n\n"); } This uses `new_message()` and `add_to_message()` as the interface to the message storage system. This implies that there’s some hidden state about the current message; also, since there's no `close_message()` call, it implies that the message storage doesn’t know that the message is complete, even after we return. So it must be able to return search results for messages that are still being input! This is adequate for our very simple, single-threaded storage system, but a smarter storage system would probably require a slightly larger interface. (And maybe more thought about error recovery.) This routine displays an entire message. The search results are stored (as `char*`s) in the static global array `results`, with their count in `nresults`. The UI routines have the responsibility of translating between the human 1-based indices and the machine 0-based indices. static inline void show_result(int which) { if (which < 1) fail(INVALID_COMMAND); if (which > nresults) fail(INDEX_OUT_OF_RANGE); show_char('#'); show_num(which); show_string(":\n\n"); show_string(results[which-1]); } `INVALID_COMMAND` is here for a bad reason: `process_command` calls `input_num` to convert the user input into a number, but on empty string, `input_num` simply returns 0. In that case, the `INDEX_OUT_OF_RANGE` error message is unhelpful. `brief` is connected to `show_result`, since it displays a table of contents from which the user might type a number: static inline void brief() { int ii; for (ii = 0; ii < nresults; ii++) { show_char('#'); show_num(ii+1); show_string(": "); show_first_line(results[ii]); } } That covers the user commands other than the searching. Here is the tower of output routines that connects the above to the actual output of the program. /* in output routines: */ static inline void show_char(char c) { write(1, &c, 1); } `show_char` (inefficiently!) calls `write(2)` with a single byte. static void show_num(int n) { if (n / 10) show_num(n / 10); show_char('0' + n % 10); } `show_num` is the standard recursive binary print routine. static inline void show_first_line(char *message) { while (*message && *message != '\n') show_char(*message++); show_char('\n'); } `show_first_line` appends a newline to its output whether or not there was one in its input. static inline void show_string(char *string) { while (*string) show_char(*string++); } And that’s it. 14 lines of code is the entire output subsystem. The corresponding input subsystem is a little bit larger. A really realistic one would be larger still. `input_num` doesn’t actually read input; it just translates sequences of bytes into numbers: /* in input routines: */ static inline int input_num(char *input) { int rv = 0; while (*input >= '0' && *input <= '9') rv = rv * 10 + *input++ - '0'; return rv; } The actual input routines are `input_char` (analogous to `show_char`) and `input_line`. static inline int input_char() { char c; read(0, &c, 1); return c; } In a real bare-metal system, `input_line` would handle line editing (at least backspace, say, and probably arrow keys) rather than leaving that up to an operating system. It would probably be quite a bit larger. This `input_line` is fairly simple. static void input_line(char *buf, int size) { char c = '\0'; if (size < 1) fail(STUPID_BUG); for (;;) { if (c == '\n' || size == 1) { *buf++ = '\0'; return; } c = input_char(); *buf++ = c; size--; } } Note that no provision is made here for end-of-file on input. So that’s the whole front end of the system: all the input and output, down to the level of single bytes. The rest of the system just contains some simple string handling, a totally stupid brute-force full-text search, and a totally stupid message store. The string handling all has to do with case-insensitive string comparison. This is a little silly, since the original system appears to have been all-upper-case, but it’s necessary for modern use. /* in string handling: */ static inline int whitespace(char c) { return (c == ' ' || c == '\n'); } I’m ignoring tabs for now. The traditional name for that function is `isspace`. static inline char lower(char c) { if (c >= 'A' && c <= 'Z') return 'a' + c - 'A'; return c; } The traditional name for that function is `tolower`. There are a couple of different kinds of string comparisons we want to do. One is the simple fixed-position substring comparison done in `process_command`, and the other is finding a substring anywhere in a message. It turns out they can share a single inner-loop routine, which returns the address where the first mismatching character occurs: static char *mismatch(char *a, char *a_end, char *b) { while (a != a_end && *b && lower(*a++) == lower(*b++)); return b; } If it made it to the end of `a`, then we know that all of `a` was found at that position in `b`. This is the logic behind `substreq`: static inline int substreq(char *a, char *b) { char *a_end = a; while (*a_end++); return mismatch(a, a_end, b) == b + (a_end - a); } (It needs to find its way to the end of the input search string in order to do this, unfortunately.) The substring *search*, however, keeps advancing through the haystack until it discovers it is hitting the end: /* Does the null-terminated haystack contain [needle,needle_end)? */ /* (case-insensitively) */ static inline int contains(char *needle, char *needle_end, char *haystack) { for (;;) { char *hp = mismatch(needle, needle_end, haystack); if (hp == haystack + (needle_end - needle)) return 1; if (!*hp) return 0; haystack++; } } The `needle_end` parameter is used to indicate the end of the needle so that we don’t have to copy it out of the command line or drop nulls in the middle of the command line. In the few years after Community Memory, some faster string search algorithms were discovered, including the Boyer-Moore algorithm, any variant of which would almost certainly be several times faster here. It was discovered in 1977. Messages are stored one after the other with null bytes between them. There’s an array of pointers to the beginnings of messages. Originally I thought this was for efficient access but it turned out to be sort of a waste. /* in message storage: */ #define end_of(array) ((array) + sizeof(array)/sizeof((array)[0])) enum { MAX_MESSAGES = 512 }; static char text[32768]; static char *textptr = text; static char *messages[MAX_MESSAGES]; static char **nextmessage = messages; static inline void add_char_to_message(char c) { if (textptr == end_of(text)) fail(TEXT_FULL); *textptr++ = c; } static inline void add_to_message(char *line) { while (*line) add_char_to_message(*line++); } static inline void new_message() { add_char_to_message('\0'); if (nextmessage == end_of(messages)) { fail(MESSAGES_FULL); } *nextmessage++ = textptr; } There’s a bug here: it’s possible to fail at adding the trailing null to the last message, and then you get an unterminated string. Finally, here’s the text search. The `search_for_term` function runs over an array of message pointers, starting at `start`; the message pointers pointing to messages containing the search term get copied into a second array, `results`. `start` and `results` may be the same or they may be different. `n` indicates the number of items in `start` to examine. The function returns a pointer past the end of the results that it has written so that the caller can tell how many there were. /* in brute-force full-text search: */ static inline char **search_for_term(char *term, char *term_end, char **start, int n, char **results) { for (; n; start++, n--) if (contains(term, term_end, *start)) *results++ = *start; return results; } Finally, we have `search_for_terms`, which loops over the terms given in the command line, each time constructing a possibly smaller result set. This is the function that populates `results` and `nresults`. static char *results[MAX_MESSAGES]; static int nresults; /* Puts results of space-separated multi-term search into results[]. */ /* Input is n messages. */ static inline void search_for_terms(char *terms, char **input, int n) { while (*terms) { char **results_end; char *term_end; while (whitespace(*terms)) terms++; term_end = terms; while (*term_end && !whitespace(*term_end)) term_end++; results_end = search_for_term(terms, term_end, input, n, results); n = nresults = results_end - results; terms = term_end; input = results; } } So, that’s it. This is not really all that optimized for code size. It took me about two hours to write and, I hope, thoroughly debug. As far as I can tell, it reproduces all of the documented functionality of the XDS-940-based Community Memory bulletin board installed in Berkeley and San Francisco in 1972–4. For details, see and and and , which last I haven’t been able to get a copy of yet. There are tantalizing clues in some papers about boolean queries and “attaching keywords”, but I haven’t been able to find any details. Compiling this with `diet gcc -Os communitymemory.c` and then `strip`ping the result yields: in the resulting executable: ELF executable size in bytes : 2852 bytes of executable code : 1958 instructions : 706 bytes of read-only data : 364 bytes of initialized data : 16 bytes allocated at startup for messages: 37036 Notes on the original Community Memory hardware ----------------------------------------------- The original system ran on an XDS 940, aka SDS 940, a 24-bit CPU with 16 (although normally 32) to 64 kilowords of core memory. Core memory, like a hard disk, doesn’t lose data at power off, so the lack of any code for saving to disk in the above might be reasonable. The lack of code to pack and unpack characters from 24-bit words might not be, and of course in order to bring up the second terminal, the program had to do non-blocking I/O and have separate per-terminal input buffers and other user session state. The SDS 930, and thus presumably the 940, took 3.5 microseconds to add two integers; the 940 had a complete read/write cycle time of 1.75 microseconds. The <1000 instructions of this program would have occupied probably less than 1000 words of the 32768 to 65536 words available on the machine. Further evaluation and analysis of the toy reimplementation ----------------------------------------------------------- GCC inlined stuff enough that there were only nine functions left over from the above code: `main`, `cli`, `show_result`, `show_result_count`, `show_num`, `show_string`, `input_line`, `substreq`, and `search_for_terms`, 1400 of those bytes of executable code. (All of these except `main` and `cli` are called from more than one place.) The other 558 bytes of code was dietlibc stuff. This should be a fairly close approximation to what you’d need for such a system in real life. It depends only on the library functions `setjmp` and `longjmp` and the system calls `read` and `write`, each for a single byte. `setjmp` and `longjmp` are a few instructions of assembly each; `read` and `write` of a byte can require anything from a single instruction to an entire operating system. A brief cachegrind test where I created three messages (from the three paragraphs above) and did five one-word searches over them involved running 221 244 instructions and accessing data 115 634 times: 72 570 reads and 43 064 writes. This amounts to roughly 40 000 instructions, 15000 reads, and 9000 writes per search. If the instructions were similar in power to those on the XDS 940 and similar in run-time to its add instruction, then these searches would have taken about 140 milliseconds each: 50 milliseconds per message. Clearly this is not an extremely efficient approach, but not out of the question due to inefficiency either. in the test input: add The original system ran on an XDS 940, aka SDS 940, a 24-bit CPU with 16 (although normally 32) to 64 kilowords of core memory. Core memory, like a hard disk, doesn’t lose data at power off, so the lack of any code for saving to disk in the above might be reasonable. The lack of code to pack and unpack characters from 24-bit words might not be, and of course in order to bring up the second terminal, the program had to do non-blocking I/O and have separate per-terminal input buffers and other user session state. The SDS 930, and thus presumably the 940, took 3.5 microseconds to add two integers; the 940 had a complete read/write cycle time of 1.75 microseconds. . find system brief add This should be a fairly close approximation to what you’d need for such a system in real life. It depends only on the library functions `setjmp` and `longjmp` and the system calls `read` and `write`, each for a single byte. `setjmp` and `longjmp` are a few instructions of assembly each; `read` and `write` of a byte can require anything from a single instruction to an entire operating system. . find brief GCC inlined stuff enough that there were only nine functions left over from the above code: `main`, `cli`, `show_result`, `show_result_count`, `show_num`, `show_string`, `input_line`, `substreq`, and `search_for_terms`, 1400 of those bytes of executable code. The rest was libc stuff. . add GCC inlined stuff enough that there were only nine functions left over from the above code: `main`, `cli`, `show_result`, `show_result_count`, `show_num`, `show_string`, `input_line`, `substreq`, and `search_for_terms`, 1400 of those bytes of executable code. The rest was libc stuff. . find stuff find close find ran brief print quit (I added a “quit” command for profiling purposes.) I reran the test input without inlining (for profiling purposes) and got about twice the number of instructions, 440 000. A little analysis with kcachegrind shows that lower() got called 5590 times, executing 50 375 instructions; its caller, mismatch(), got called 2606 times (thus examining about 2.15 characters on average) and executing 131 000 cumulative instructions. Next down the list were contains() (27901 instructions on 17 calls; thus about 1700 instructions per call, which is reasonable for brute-force searching a few hundred characters) and show_char() (57942 instructions, including write(), for 2195 calls). The read() and write() system calls together took over 47000 instructions, 20% of the original total. Speculation on other approaches ------------------------------- It’s clear here that brute-force searching would have been a problem in 1972 once you got beyond a few postings: with a database of four postings, the time for a search would be noticeable, and with ten, the pause would start to get annoying. A system with constant posting from a few terminals over the course of three years would become unusable within hours. Now, the above code could certainly be run on a modern microcontroller, although there are tricky questions of what to do with the input and output text. (Tap them in Morse code? Speech synthesis — and recognition?) And the microcontroller could run at 200 MIPS instead of 0.3 MIPS, which lets you carry the brute-force approach quite a bit further. In fact, if you had an expiry system for old messages, you could probably use brute-force search the whole time. But clearly some kind of indexing is absolutely necessary to run on 1972 mainframes instead of 2010 microcontrollers, at least with any kind of acceptable performance. Additionally, if you could compress the text, you could keep more of it in memory. Some experiments I've done earlier suggest that a huffman dictionary containing the 10 000 most common English words of more than one letter (rather than merely putting bytes in your dictionary) can reduce unpunctuated English to 2.31 bits per character. A smaller dictionary of some 2000 words would give you 2.88 bits per character; it would itself require some 4000 pointers to represent (although those pointers only need 11 bits each) plus the actual text of the words (although that could be handled with huffman coding for the letters at some 4.2 bits per letter). In total this would be about a hundred kilobits, or 4200 kilowords on the SDS-940. This level of compression would allow you to keep almost three times as much text in core. It is also suitable for keyword searching, and would dramatically speed up I/O to larger storage. A bloom filter (discovered in 1970!) could permit even faster searching of the data in core or even on disk, at a quite reasonable space cost. Suppose, though, you weren’t constrained by core. The 940's Theory of Operations manual lists the following: “Mass Random Storage: 524,288 to 8,388,608 character rapid-access devices; 8.3 to 63.1 million character mass memory disc files.” The “rapid-access devices” were what are commonly referred to as drums, and the characters in question are six bits. The actual drums themselves were only 524288 or 1048576 characters in size. A “RAD” of larger size was constituted by ganging together multiple drums on a single controller. How much typing is 8 388 608 characters? If you type 40 words per minute (and you’re unlikely to be so fast in 1972) that’s 40×5 characters uncompressed. If there are four terminals in use 12 hours a day over three years, that’s about 3.2 million minutes, so the users can type 630 million bytes in that time. So even the biggest disk available for the machine at the time is low by an order of magnitude. Even with dictionary compression you’d still be short on space by a factor of 3. And that’s before you spend space on an index! The drum sectors are 64 words (256 characters) with 35ms maximum rotational latency. An entire track (“band”) of 4096 characters could be transferred in 35ms, which is 700 kilobits per second. So the average bandwidth-latency product of the drum was half that, or 2048 characters. If you wanted the whole “band”, though, the latency was only a single sector, so 500μs or so. On the other hand, a mebicharacter of dictionary-compressed postings would have been about 2.2 million characters uncompressed; that’s 11000 minutes of constant 40wpm typing, or 183 hours. That’s a week. Inverted indices, when compressed nicely, typically consume a bit less space than the original text, compressed. What they actually did ---------------------- I imagine that instead of doing full-text search, they only searched on keywords that were specifically attached to postings. That’s simple and pretty trivial to implement. If you wanted to make this program useful ----------------------------------------- You’d probably want to decide what hardware you were going to run it on and what you’d need to do to make it not lose its memory. Networking it might be a more useful thing, though.