/* I was noodling around about CoAP and ZeroMQ and Nanomsg and * FlatBuffers and what it would look like if you had a messaging * system that was really designed for speed. I came to the tentative * conclusion that my laptop could probably handle about three million * request/response pairs between processes without any batching, or * maybe 80 million request/response pairs between processes with * batching, or 40 million if they have to pass through a message * broker. * * So I thought maybe I’d try to write a thing that did something like * that and see how fast it went. * * Here we have a Y communication topology, with a “reverse proxy” or * “message broker” sitting in front of two “backend services” and * passing requests to them: * * ___ adding service * / * Requester ---- Broker < * \___ * multiplying service * * The requester generates pairs of numbers and randomly chooses * whether to add or multiply them, packaging them up into requests * with a request-ID and sending all of these requests to the broker. * The broker examines each request to see if it’s an addition request * or a multiplication request, then passes it on to the appropriate * service. The services perform the requested additions or * multiplications and send back the results to the requester, who * checks the results to see if they are correct. * * The messages have a fixed-format 8-byte header, similar to the * fixed-format 4-byte header used in CoAP. For now, it contains * these fields: * * |-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+| * | 0 | type | length | transaction identifier | * |-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+|-+-+-+-+| * * Here, the 8-bit `type` is 0 for a request or 1 for a response; the * 8-bit `length` is the number of 8-byte words that follow the * header, from 0 to 255; and the transaction identifier is a 4-byte * nonce like the CoAP token or a UDP source port number allocated by * the requester to correlate responses to requests. The first two * bytes are 16 bits of 0. * * I’m still lacking a really clear theory about how routing and * naming is supposed to work, but I’m assuming I can come up with an * answer that doesn’t affect performance too badly. For now, I’m * going to figure that the first 64-bit word in a request or response * body is the identity of the requester, and so both the broker can * merely leave it intact, and the origin server can copy it from the * request to the response along with the header (other than the * type). * * The current amd64 code for the routine below to route a message is * as follows: 20 0000 53 pushq %rbx 24 0001 488B1F movq (%rdi), %rbx # load message header 25 0004 31C0 xorl %eax, %eax # prepare 0 return value in case it’s needed 26 0006 48C1EB20 shrq $32, %rbx # extract length field 27 000a 83C301 addl $1, %ebx # increment it 30 000d 0FB6CB movzbl %bl, %ecx # zero-extend it 31 0010 4839D1 cmpq %rdx, %rcx # check to ensure we have space 32 0013 760B jbe .L7 # if not, return 34 0015 5B popq %rbx 38 0016 C3 ret 40 0017 660F1F84 .p2align 4,,10 40 00000000 40 00 42 .L7: 44 0020 4889F0 movq %rsi, %rax # save destination for memcpy call 49 0023 488D14CD leaq 0(,%rcx,8), %rdx # multiply word length by 8 for byte count 49 00000000 51 002b 4889FE movq %rdi, %rsi # pass source to memcpy call 53 002e 4889C7 movq %rax, %rdi # pass destination to memcpy call 55 0031 E8000000 call memcpy 55 00 60 0036 0FB6C3 movzbl %bl, %eax # return number of words copied 62 0039 5B popq %rbx 65 003a C3 ret * * This is 16 instructions on the usual path and so probably about 5 * CPU cycles (2 ns), plus whatever memcpy itself costs. * * One possible approach for routing is to make recursively wrapped * messages, with some series of headers. On their way through the * system, they accumulate return-address headers (somewhere else I * guess?). This seems like it might impede things like path * translation, caching, and permission checking. * */ #include #include typedef uint64_t word; static inline uint8_t msg_length(word header) { return header >> 32 & 0xff; } /* Returns the number of 8-byte words copied. */ int route_message(word *message, word *dest_buf_ptr, size_t dest_buf_remaining) { uint8_t len = msg_length(message[0]) + 1; if (len > dest_buf_remaining) { return 0; } memcpy(dest_buf_ptr, message, sizeof(message[0]) * len); return len; }