#!/usr/bin/python
# -*- coding: utf-8 -*-
"""Compute proportional-font print size of a text.

The laser printer at my new workplace is 600dpi in both directions.
It prints on A4 or similar paper: 216×279mm.

Simple multiplication yields a capacity of 33.6 megabits per page, or
about 4 megabytes.

At 600dpi, a 4×6 pixel character cell like the one I use in
<http://canonical.org/~kragen/sw/dofonts-1k.html> gives you an 80×66
page of 13.5 mm × 16.8 mm.  (Janne Kujala designed the font.) If you
can successfully control every pixel, the result should be clearly
readable with a magnifying glass.  (If we consider 5-point text as the
lower limit of comfortable readability, and these 6-pixel-tall
characters are 1/100 inch, you need 7× magnification to make the text
comfortably readable.)

Further calculation suggests that the A4 page will contain an array
fitting 16 such reduced pages horizontally and 16.6 of them
vertically; practically this is probably 15 horizontally and 16
vertically, or 240 pages, 480 pages on the two sides of the paper, or
31680 80-column lines.  This is on the order of a megabyte and a half
of text, assuming an average of about 50 bytes per line.  You could
print the King James Bible on four sheets of paper.

In this form, the Rosetta Disk's 13000-page archive would require some
260 sheets of paper, the size of an average hardcover book.  Their
metal disk is probably more durable than the paper, but the paper
version can be printed for about US$20-100, rather than the several
thousand dollars for the disk.

But perhaps we can do better!  That's only about a third of the total
data capacity of the page.  Can a proportional font let us use less
than 4 pixels horizontally, on average?

After inspection, I think that the 4×6 pixel font would need the
following widths for its 96 character glyphs, modified slightly, and
ensuring one pixel of space on the right of each one:
"""

proportions_1 = map(int, """
2 2 5 4 4 4 4 2 3 3 4 4 3 3 2 4
4 3 4 4 4 4 4 4 4 4 2 3 4 3 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 3 4 3 4 3
3 4 4 4 4 4 4 4 4 2 3 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 2 4 5 0
""".split())

"""
So we can definitely use less than 4 pixels on average.  We could
probably even give an extra pixel or two of width to M, N, W, m, w,
and maybe #, Z, and z, and get an improvement in readability:
"""

proportions_2 = map(int, """ 
2 2 6 4 4 4 4 2 3 3 4 4 3 3 2 4
4 3 4 4 4 4 4 4 4 4 2 3 4 3 4 4
4 4 4 4 4 4 4 4 4 4 4 4 4 6 5 4
4 4 4 4 4 4 4 6 4 4 6 3 4 3 4 3
3 4 4 4 4 4 4 4 4 2 3 4 4 6 4 4
4 4 4 4 4 4 4 6 4 4 5 4 2 4 5 0
""".split())

"""
If you're really shooting for density instead of readability, you
could make up some other glyph to represent parapraph breaks or line
breaks or whatever, probably only 3 pixels wide.
"""

def total_pixel_width(inputfile, proportions, newline_width):
    assert len(proportions) == 96
    widths = [0] * 32 + proportions + [0] * 128
    widths[ord('\n')] = newline_width
    
    total_width = 0
    for line in inputfile:
        for char in line:
            total_width += widths[ord(char)]
    return total_width

def main():
    import sys, codecs
    sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
    kjv = lambda: open('bible-pg10.txt')
    print u"Fixed-width 4×6 with a 4-pixel newline:"
    print total_pixel_width(kjv(), [4] * 96, 4)
    print "With some characters narrower:"
    print total_pixel_width(kjv(), proportions_1, 3)
    print "With some narrower and some wider:"
    print total_pixel_width(kjv(), proportions_2, 3)

if __name__ == '__main__':
    main()

"""
Output:
Fixed-width 4×6 with a 4-pixel newline:
17407376
With some characters narrower:
15187418
With some narrower and some wider:
15486129

So yes, the proportional font saves 11% of the space, or 13% in the
version where none of the glyphs are wider.  You’d need 15.5 million
pixels horizontally to represent the PG KJV this way with the wider
variant, or 92 916 774 total pixels, 92.9 megabits or 11 614 597
bytes’ worth.  So you could print the entire KJV on three sheets of
paper, using a regular laser printer. And with some human effort you
can do substantially better; about 0.3 million of those 15.5 million
are used just in representing newlines, and most of those are
linebreaks to keep the lines from getting too wide, not for separating
paragraphs.

"""
