- Testing requirements after changes:

  - Test all functions return either native or bigints.  Functions that
    return raw MPU::GMP results will return strings, which isn't right.

  - Valgrind, coverage

  - compile with -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fsigned-char



- Add test to check maxbits in compiled library vs. Perl

- Figure out documentation solution for PP.pm

- Is the current PP.pm setup the way we want to do it?

- Move .c / .h files into separate directory.
  version does it in a painful way.  Something simpler to be had?

- finish test suite for bignum.  Work on making it faster.

- Test all routines for numbers on word-size boundary, or ranges that cross.

- An assembler version of mulmod for i386 would be _really_ helpful for
  all the non-x86-64 Intel machines.

- More efficient Mertens.  The current version has poor growth.

- It may be possible to have a more efficient ranged totient.  We're using
  the sieve up to n/2, which is better than most people seem to use, but I'm
  not completely convinced we can't do better.

- Big features:
   - LMO prime count
   - QS factoring

- segment sieve should itself use a segment for its primes.
  Today we'd need sqrt(2^64) max = 140MB.  Segmenting would yield under 1MB.

- Figure out a way to make the internal FOR_EACH_PRIME macros use a segmented
  sieve.

- Rewrite 23-primality-proofs.t for new format (keep some of the old tests?).

- Use Montgomery routines in more places: Factoring.

- Put euler_phi and moebius directly in XS.
    (1) optional second argument.  Easily handled, and not hard to do in
        generic sub call.
    (2) generic sub returns an array.  This is the sticking point.

- Factoring in PP code is really wasteful -- we're calling _isprime7 before
  we've done enough trial division, and later we're calling it on known
  composites.  Note how the XS code splits the factor code into the public
  API (small factors, isprime, then call main code) and main code (just the
  algorithm).  The PP code isn't doing that, which means we're doing lots of
  extra primality checks, which aren't cheap in PP.

- I believe we can make the Lehmer/LMO small primes uint32_t, which will
  give some more memory reduction and a little speed.
