Evangelism
----------

The 'arrayfrombuffer' package features support for Numeric Python
arrays whose contents are stored in buffer objects, including
memory-mapped files.  This has the following advantages:

- loading your array from a file is easy --- a module import and a
  single function call --- and doesn't use excessive amounts of
  memory.
- loading your array is quick; it doesn't need to be copied from one
  part of memory to another in order to be loaded.
- your array gets demand-loaded; parts you aren't using don't need to
  be in memory or in swap.
- under memory-pressure conditions, your array doesn't use up swap,
  and parts of it you haven't modified can be evicted from RAM without
  the need for a disk write
- your arrays can be bigger than your physical memory, up to the
  limitations imposed by your virtual memory address space and your
  OS; admittedly, this used to be a bigger deal when our PCs had 64
  mebibytes of RAM, 2 gibibytes of address space, and 4 gibibytes of
  disk space.  But on my laptop, which has 128 MiB of RAM and 256 MiB
  of swap, I added two 512-mebibyte arrays to produce a third one with
  the statement "Numeric.add(a, b, c)."  It took eleven minutes,
  though.
- when you modify your array, only the parts you modify get written
  back out to disk

In theory, you could also use this package for things like arrays in
memory shared between programs or arrays in distributed shared memory.
I haven't tried using it for those things, though.

Someone built a package called "Vmaps" that does something similar:
http://snafu.freedom.org/Vmaps/.  It was released 2002-01-22.

Usage
-----

Using it is very easy.  To create the array file on disk:

    open('tmp.foo', 'wb').write(somenumericarray.tostring())

To load it back in as type 'l', flattened:

    import maparray
    myarray, mymmapobj, myfile = maparray.maparray('tmp.foo')

Now myarray is a perfectly ordinary Numeric array whose data just
happens to be stored in the file 'tmp.foo'.

If you want a different data type, you can specify it:

    myarray, mymmapobj, myfile = maparray.maparray('tmp.foo', 'f')
    myarray, mymmapobj, myfile = maparray.maparray('tmp.foo', typecode='f')

You can specify a shape as well:

    myarray, mymmapobj, myfile = maparray.maparray('tmp.foo', 'f', (-1, 24))
    myarray, mymmapobj, myfile = maparray.maparray('tmp.foo', shape=(-1, 24))

If you make changes via myarray and you want them reflected in the
file before you delete all references to the mmap object and the
array, do this:

    mymmapobj.flush()

Wishlist
--------

It might be nice to have some of the following features:
- knowing whether or not it works on Microsoft Windows
- read-only access; this has a couple of problems to solve:
  - the mmap module on Microsoft Windows doesn't support read-only 
    access
  - the Numeric module doesn't support read-only arrays, as far as I 
    can tell, so the way you'd find out you were trying to write to a
    read-only mapping would be by a segmentation fault.
- using part of a buffer instead of the whole thing (or mmapping part
  of a file instead of the whole thing)
- not crashing if you close() the mmap object and then access myarray;
  unfortunately, this is very difficult, and probably the easiest way
  to do it is to provide a version of the mmap module that doesn't
  have a close() method.
- support for explicit lengths so you can create arrays this way too;
  on Unix, you can use ftruncate() to set the length of the file, but
  on any OS, you can write a bunch of zero bytes.
