From kragen@dnaco.net Mon Sep 21 18:37:09 1998 -0400 Date: Mon, 21 Sep 1998 18:37:07 -0400 (EDT) From: Kragen To: bsittler@nmt.edu Subject: distributed cache scalable web Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Keywords: X-UID: 2060 Status: O X-Status: Mon, 21 Sep 1998 18:37:07 -0400 (EDT)he objective ------------- To design a scalable system for mass information distribution. Definitions of Terms -------------------- The system consists of many independent machines, known as 'nodes', that cooperate to distribute the information. Scalability is defined as having the amount of work required of any node grow sublinearly with network size, so that it can reasonably cope with 10^10 users (as opposed to the 3 * 10^8 or less today). The information system distributes things called "notes", which correspond to web "pages" or Usenet "articles". There are people that use the system; they want to read notes and publish new notes. Any person can read any note. Most people read more notes than they publish by a factor of 10^2 to 10^4. Notes do not change after they are published. Usenet and the Web, and Why they Suck ------------------------------------- Usenet (or rather, the subset of Usenet fitting this description) does not scale well. Since the number of notes grows roughly linearly with the number of people using the system, and since each node is required to have a copy of every note (in the subset in which any person can read any note), the amount of work per node grows linearly with the network size -- more exactly, with the number and loquacity of the writers. Expiring notes after a week to a month helps a lot, but not enough. As part of this problem, Usenet's traffic is growing to the point where it's more economical for many smaller ISPs to use remote NNTP servers, since most articles get read, on average, less than once at that particular ISP. The Web does not scale well for a different reason. Since the number of readers for any note grows roughly linearly with the number of people using the system, and since every person who wants to read a note must request it from its Web server of origin, and since each such request requires a constant amount of work, the amount of work per Web server grows linearly with the network size -- more exactly, with the number and appetite of the readers. Shared caches help a lot, but not enough. The Basic Design ---------------- Nodes are interconnected, as in Usenet, by the mutual agreement of their owners. Interconnected nodes, and only interconnected nodes, exchange notes. Nodes are interconnected redundantly, also as in Usenet. The graph of the network has many cycles. People have a particular node, or a few nodes, that they use for all their reading and posting. This relationship is established by mutual agreement between these "users" and the owner of the node. Home Nodes ---------- Each note has one or more "home nodes", who commit to always keeping a copy of it, or at least keeping a copy of it as long as the note's poster wants to continue to publish it. How People Read Notes --------------------- Users of a note's home node can read the note directly, simply by asking the node to show them the note. Users of nodes connected to a home node of a particular note can ask their node to show them the note. Their node will determine the home nodes of the note and ask one of them to give it the note. The home node will give the user's node the note. The user's node will remember the note for a while, just like a web caching proxy. If another user of the same node wants to read the note soon after, they ask their node, and their node gives them the remembered copy. Multi-level requests can proceed in the same way. If note X has home node A, A is connected to B, B is connected to C, and a user of C wants to read note X, then node C can ask node B for the note, node B an ask node A for a copy of the note, node A can give node B a copy of the note, node B can give node C a copy of the note, and both nodes B and C can remember the note for a while. Routing ------- The question of how node C figures out how to ask node B (rather than, say, node D) for note X is rather a tough question. The simplest way is for every node to maintain an up-to-date, accurate copy of the topology of the entire network. A network serving 10^10 people is likely to have 10^7 nodes or more. Remembering the names of so many nodes is likely to occupy 50-100 megabytes; remembering all their connections is likely to occupy another eight bytes or so per connection, for another 800 megabytes. This is likely to be an impractically-large data set to walk through, looking for paths, at least in the near future. Other methods of routing are possible. One possibility is to include routes in note IDs. Another possibility is to clump large groups of nodes into "domains" with a few well-known interfaces to the outside world; the topology inside the domain is not of interest to the outside world, and the topology in the outside world is not of interest to those inside the domain. A final possibility is to advertise routes from certain well-known places. Cross-note Links ---------------- How can people learn about my note in the first place? Each note will have a unique note-id, from which any node can determine the note's home nodes, and route a request to one of them for the note. Threads, different versions of documents, and other such things can be done this way, although I'm not sure how. They would seem to require mutable notes. Social and Economic Incentives ------------------------------ The nonscalability of the Web gives posters a perverse incentive to write unpopular Web pages, rather than popular ones, if they are not being paid for each reader. Someone who puts up a popular but unprofitable page will incur great costs as millions of people fetch their Web page, and may even be unable to deliver the page to most of these people. Usenet, on the other hand, tends to encourage people to write articles that are profitable for them, but unprofitable for Usenet as a whole -- because every Usenet news server has to bear the cost of them, even if nobody wants to read them. Usenet has evolved social mechanisms to discourage this. The Web has evolved social mechanisms that help cut down on the popularity-requires-profits problem. It would be nice if the incentive structure of the network corrected these problems at a local level. My proposed network does indeed solve these problems this way. If I put a popular note up on my home node, I can put a pretty tight upper bound on how often it is requested, and thus on how much work my node will have to do as a result. My node has some small number of users -- perhaps a few hundred -- and each will probably only request the note a maximum of a few times a day. My node has some small number of neighbors, nodes with which it is interconnected. Each of them might request the note; they will not request it again until at least the amount of time they cache it. If it turns out to be a very popular note, they might continue to cache it for a long time without making any further requests of my node. This reduces the popularity-requires-profits problem. If it doesn't reduce the problem far enough, there are several measures the owner of my node can take (other than asking me to move my note elsewhere): - If there's a user who's requesting the note repeatedly, they can be asked to stop. If they don't stop, they can be denied access to the node. (Since notes are unchanging, no benefit accrues to users who request notes repeatedly, except that they don't have to store it on their own disk.) - If there's a neighboring node that requests the note repeatedly, they can be asked to cache it longer. If they refuse, the interconnection can be broken. The other problem -- the spam problem -- is different. Suppose I post something on my node that's very profitable, but imposes heavy costs on people in between me and the rest of the world. (Say, an advertisement for a porno service.) Here are the potential scenarios. - If nobody wants to look at it, nobody is inconvenienced, because copies of it never end up anywhere else besides my node. - My neighbors will not suffer much extra network load because of it, because of the solutions to the other problem. However, it will take up their disk space. If I do this enough to impose significant costs on them, they can charge me money for the privilege of interconnecting with them. - My neighbors' neighbors are in the same situation. They are free to charge my neighbors money, or to put pressure on them to disconnect from me. - The nodes near the people who are interested in reading my notes can ask those people to go somewhere else, to pay extra for the privilege of reading expensive notes, or simply refuse to serve up my notes. These solutions are not as satisfactory as the solutions to the Web problem, but they are far better than Usenet's solutions. They are all point-to-point solutions, requiring only that people cooperate with people they already have a trust relationship with, and if they can't, severing the relationship is a solution in every case. Security -------- This doesn't deal with routing security. Someone who can inject undetectably false routing information can obviously, at least, cause a denial of service; worse, they may be able to forge notes, etc. I'm not going to deal with that at the moment, because I don't understand how routing will work. Protection against forging and alteration of notes could be as follows. Someone who wants to make sure that they're not being given a falsified copy of a note can request the note through several independent, or partly-independent, paths. This is obviously not a defense against an omnipotent attacker, but it requires many nodes to collude in falsifying the note -- and while it's often possible to falsify the note without corrupting enough nodes to make it possible to find a route around the corruption, it isn't possible to determine the minimal set of nodes to corrupt to produce a false answer to a particular query in advance, or even at the time -- you don't know what other routes the note has been requested through. Much of this requesting can take the format of requesting just a secure hash of the note. Another possibility is to include a secure hash of the note in the note-ID. This is obviously dependent on the security of the channels of distribution of the note-ID. Another possibility would be to sign notes with public-key cryptography. This method is dependent on the security of the underlying cryptosystem and key-distribution infrastructure. At the moment, the latter would have to be either ad-hoc or centralized and expensive. Kragen -- Kragen Sitaker The sages do not believe that making no mistakes is a blessing. They believe, rather, that the great virtue of man lies in his ability to correct his mistakes and continually make a new man of himself. -- Wang Yang-Ming