Books Kragen Recommends on Programming

My taxi driver asked me what books he should read to learn how to program computers. I wrote down a long list and gave it to him while talking about each book; it occurred to me that I should put something like it into a web page for other people to use.

Alice in Wonderland/Through the Looking Glass by Lewis Carroll, of course, occupies the place of honor at the head of the procession. Alan Perlis said, ``The best book on programming for the layman is Alice in Wonderland, but that's just because it's the best book on anything for the layman.'' This book is available cheaply in lots of places, including for free from Project Gutenberg <ftp://ibiblio.org/pub/docs/books/gutenberg/etext97/alice30h.htm> or <ftp://ibiblio.org/pub/docs/books/gutenberg/etext91/alice30.txt>; see also <ftp://ibiblio.org/pub/docs/books/gutenberg/etext91/lglass18.txt>. It introduces the reader to many mathematical concepts that are central to programming, as well as a peculiar mindset. Folks who read Essentials of Programming Languages (see below) will find that the lengthy discussion therein about the distinction between the name of a variable, the binding of a variable, and the value of its binding is echoed by the White Knight's musing about his poem's name, for example.

Martin Gardner's The Annotated Alice <http://isbn.nu/0393048470> is an extensively explained version of the above book, including explanations of the mathematics and various vaguely related things.

Douglas R. Hofstadter's Gödel, Escher, Bach <http://isbn.nu/0465026567> is more closely related to computer programming. It explains, first by example and metaphor, then explicitly, and finally concretely with exercises, a variety of concepts central to understanding computers: self-reference and self-reproduction, recursion, computability, mathematical logic, paradox, Lewis Carroll, infinite regress, the necessity of imperative logic, isomorphisms between systems (particularly formal systems and the real world, and the significance of imperfect isomorphisms), the form-content dichotomy, the figure-ground dichotomy, Gödel's incompleteness theorem, encoding and decoding, the location of meaning, semantics-preserving transformations, Zen, wholism versus reductionism, the use-mention dichotomy, the Church-Turing Thesis, and finally, free will and consciousness.

Structure and Interpretation of Computer Programs <http://isbn.nu/0262011530> or <http://mitpress.mit.edu/sicp/> (also, full text on the web at <http://sicp.arsdigita.org/>), by Harold Abelson, Gerald Jay Sussman, and Julie Sussman, changed the way I thought about programming forever. It cuts through all the dross and garbage of libraries, syntax, and clever algorithms to discuss the central issues of writing programs: controlling complexity, dealing with time, and different strategies for doing each of these: applicative-order functional programming, stateful objects, constraint-propagation networks, multithreaded programs, lazy evaluation, and nondeterministic search (including logic programming languages). Among other things, and in no particular order, this book covers the (abstract, but sufficiently concrete to be executable) design of a CPU, an interpreter, a compiler for the CPU, recursion, higher-order functions, abstract interfaces as a modularity technique (and thus polymorphic abstract data types), performance, symbolic computation, synchronization for multithreaded programs, and garbage collection.

More to the point, it teaches how to break problems into pieces in a very pragmatic and non-doctrinaire way. This book is an excellent antidote to much of the rigid doctrine that passes for ``computer science'' in many schools, particularly the more vocationally-oriented.

Before I read this book, I thought that being a good programmer meant having lots of clever algorithms in your bag. The book explains a couple of clever algorithms, and explains when and why it's important to use the right one, but puts them in a much larger context.

Peter Norvig <http://www.norvig.com/> writes, on amazon.com:

Those who hate SICP think it doesn't deliver enough tips and tricks for the amount of time it takes to read. But if you're like me, you're not looking for one more trick, rather you're looking for a way of synthesizing what you already know, and building a rich framework onto which you can add new learning over a career. That's what SICP has done for me. I read a draft version of the book around 1982 and it changed the way I think about my profession. If you're a thoughtful computer scientist (or want to be one), it will change your life too.

If you're going to do this book, which is likely to take 400-800 hours, you should have a Scheme interpreter. I recommend DrScheme <http://www.cs.rice.edu/CS/PLT/packages/drscheme/>.

(Counterpoint: another reviewer on amazon.com writes:

This text is a half-baked attempt at algroithms and data structures, and of course, since it's tailored to scheme, I never found much use for these concepts later. In fact, once you start programming in a real language like C, you'll have to relearn all the concepts and see how they really work in the proper programming setting. On top of that, the writers make absolutely no attempt to be clear.

)

It does seem that beginning programmers can't really appreciate SICP on the first reading; when they reread it as expert programmers, they almost universally say it's the best book they've ever read. Accordingly, the Amazon reviews are split roughly half and half between one star and five stars.

Friedman and Wand's Essentials of Programming Languages <http://isbn.nu/0262061457> <http://www.cs.indiana.edu/eip/eopl.html> explores, in detail, different aspects of programming languages --- with runnable interpreters including each feature discussed. It discusses, among other things, the lambda calculus, normal versus applicative order, different kinds of type systems (persuasively making the case that the most important differences between languages are in their type systems), different kinds of parameter passing, various features of syntax, different kinds of memory management, different kinds of scoping and package systems, different kinds of coercion and polymorphism, how to compile using continuations, etc. Jeff Bone recommended this book to me, for which I am eternally indebted to him. Unfortunately, my copy is on loan to my friend Joe, and so this entry must remain incomplete.

This book also benefits from having a Scheme interpreter handy.

Knuth's The Art of Computer Programming <http://www-cs-faculty.stanford.edu/~knuth/taocp.html> covers the low-level aspects of programming in exhaustive detail --- the numerous clever algorithms for computing one function or another, the low-level ways things are implemented, etc. --- that books like SICP take for granted or cover shallowly. Volume 1 <http://isbn.nu/0201896834> is on fundamental algorithms --- basic data structures. Volume 2 <http://isbn.nu/0201896842> is on ``seminumerical algorithms'' --- that is, methods for numerical computation. Volume 3 <http://isbn.nu/0201896850> is on sorting and searching. All of the programs are in assembly language for a fictitious computer called MIX, which looks rather archaic these days (with its complex addressing modes, non-recursive function calls, and word-oriented memory), although they are being converted to a new RISC machine called MMIX for the next edition.

Although Knuth has been working on TAOCP for 35 years (with a short 11-year break in the middle to revolutionize computerized typesetting), volumes 4 through 7 are still incomplete.

In a fundamental way, Knuth's work has created a science of algorithms; that is what these books cover. They won't teach you powerful ways of structuring your programs to make them simpler. They won't give you new insights about the epistemological significance of computer programming. If you have a good standard library, most of the most important algorithms discussed in TAOCP are already at your fingertips. Nevertheless, if you want to understand what's going on at the level of the metal, you need to study these books.

Robert Sedgewick's Algorithms books --- Algorithms <http://isbn.nu/0201066734> (out of print), Algorithms in Modula-3 (out of print) <http://isbn.nu/0201533510>, Algorithms in C <http://isbn.nu/0201314525>, and Algorithms in C++ <http://isbn.nu/0201350882> --- cover much of the most important material in TAOCP, plus a variety of related things. Algorithms in C was the book that taught me there was more to knowing how to program than having the library reference manual memorized. It also has the advantage of having delightfully simple, readable, and runnable code examples in C; the other books are in C++ and Pascal. They all cover more or less the same material.

Steve McConnell's Code Complete <http://isbn.nu/1556154844> <http://www.construx.com/stevemcc/cc.htm> is very concrete and practical; it covers things like how to find bugs in your programs, how to optimize them, how to choose names for your variables, why you should indent your code, how to unit-test your code, how to choose control and data structures, when to break a big routine into smaller ones and vice versa, and all manner of practical stuff. This is another book that changed the way I thought about programming.

It doesn't contain runnable code samples, because the book is not about what code does. It's about what a person writing code does and how they can do it better.

Brian Kernighan and Rob Pike's The Practice of Programming <http://cm.bell-labs.com/cm/cs/tpop/> <http://isbn.nu/020161586X> is similar to Code Complete, but is much shorter and more humble and professional; and it comes from folks with impeccable credentials. They discuss, among other things, how to optimize, when to optimize, how to choose your implementation language, how to choose names for your variables, how and why to write portable code, and notation.

There's a place for a book that explains how to find and fix common problems that totally stymie beginners --- things like incompatible header files, missing libraries, broken or missing Makefiles, mismatched braces, uninitialized variables, and memory faults. I haven't seen one; Code Complete comes closest, but it's a long way from what I'm talking about.

Robert M. Pirsig's Zen and the Art of Motorcycle Maintenance <http://isbn.nu/0553277472> is a long and insightful exposition on the nature of human experience, the experience of creation, and the oneness with one's experience that characterizes excellent work.

Lots of people seem to like Design Patterns <http://www.hillside.net/patterns/DPBook/DPBook.html> <http://isbn.nu/0201633612>, by Gamma, Helm, Johnson, and Vlissides (the Gang of Four). I haven't read it, so I don't know.

Peter Norvig's Paradigms of Artificial Intelligence Programming, <http://isbn.nu/1558601910>, <http://www.norvig.com/paip.html>, sounds interesting. I haven't read it either.

Leo Brodie's Thinking FORTH was out of print for a long time, but FIG published a second edition in 1994: <http://isbn.nu/0935533001>. It is a really excellent book; it covers different aspects of problem-solving and software design and implementation, specifically in FORTH, but the principles taught are not by any means restricted to FORTH.

Edward Tufte's Visual Display of Quantitative Information <http://isbn.nu/096139210X> is a classic, and relevant to computer programming for several reasons: first, because computer programming sometimes involves analyzing quantitative data, and much of the book is about doing that more effectively; second, because many computer programs are visible, and their visible faces need to be designed according to the principles in this book; and third, because the concepts of respect for the reader, simplicity of design, avoidance of needless distractions, and so forth are applicable much more broadly than the visual design context in which this book presents them so beautifully and persuasively.

Along similar lines, you should read Strunk and White's The Elements of Style: <http://isbn.nu/020530902X>

If you're going to program, you're probably going to do it in some language, and it probably won't be Scheme; this means you will need a good book on that language. I'm afraid I don't know what books to recommend; although I know a few programming languages, I tend to learn new languages from their reference manuals, and therefore don't have the opportunity to evaluate other books on them.

Generic tip: books from IDG, SAMS, Ziff-Davis, Sybex, Que, and similar publishers --- the six-inch-thick Teach Yourself to be an Unleashed Dummy in 21 minutes line especially --- tend to be crap. Written by unqualified authors, full of errors and typos, often including multi-page segments of text copied from somewhere else and not written by the supposed author at all, hurriedly produced, shoddily bound, bulked up with extra-thick paper, gratuitous screenshots, and gratuitous code samples, simply in order to occupy more shelf space and attract attention with their fluorescent colors, these poorly-organized travesties screw their buyers, then go out of date in months. There are occasional exceptions. See Philip Greenspun's ``The Book Behind The Book Behind The Book'' for details on how this happens.

Books from New Riders, Wrox, and McGraw-Hill, and especially O'Reilly, Prentice Hall, MIT Press, Morgan Kauffman, and Addison Wesley (now Addison Wesley Longman) are produced in a structurally different environment that tends to produce books of acceptable or good quality. This is true of the Microsoft Press books I've seen, too.