- Remove dprintfs, replace it with functions to dump the frame state
- context_set_waveletcoeffs can be rewritten as code or split up
- Properly store the amount of codeblocks per subband instead of using
  a fixed maximum of 7 subbands
- Global motion estimation has not been implemented yet, the reference
  implementations do not seem to support this yet.
- Implement the low-delay syntax
- Implement support for interlacing
- Some wavelets that are not used in practise have not been
  implemented yet.
- Think better about what does into DiracContext and what does into
  source_parameters, sequence_parameters, etc.
- Use an array (size 2) to store parameters instead of using luma_ and
  chroma_ prefixes
- For zero_neighbourhood, check if border checks can be avoided
- Update to the Dirac standard in CVS

Optimizations:

- Determine what's more efficient: memset all coefficients to 0, or
  just zero out the coefficients that need to be zero'ed out
- Use in place operations in IDWT, if that is possible
- Write IDWT in assembler
- Perhaps write arithmetic decoding in assembler
- Perhaps think about multithreading?
- Use multiples of 16 pixels for width of refframes
- Do not use GetBitContext/PutBitContext for aritmetic (de)coding
- Instead of using `struct dirac_blockmotion', it might be faster to
  save the values separately.
- Avoid coeff_posx/posy whenever possible
- Interleave IDWT/DWT stages
- Pre-calculate the spation weighting matrices for non-common cases
- Perhaps merge some multiplications (like frame weight) with the
  spatial weighting matrix to avoid some multiplications
