RELEASE OF THE LZRW1-A ALGORITHM ================================ Author : Ross Williams. Date : 25-Jun-1991. Abstract -------- This file announces the release of my LZRW1-A algorithm. The LZRW1-A algorithm is a direct descendant of the LZRW1 algorithm, bettering it a little in both speed and compression. The only reason for using LZRW1 now that LZRW1-A exists is backwards compatibility (LZRW1 and LZRW1-A cannot decompress each other's compressed files). This document specifies the difference between LZRW1 and LZRW1-A and gives details of the LZRW1-A algorithm's implementation in C and 68000 assembly language. The LZRW1 Algorithm ------------------- In April 1991, I published my LZRW1 algorithm by presenting it at the data compression conference, "An Extremely Fast Ziv-Lempel Data Compression Algorithm", IEEE Data Compression Conference, Snowbird, Utah, 8-11 April 1991. and by advertising it on the internet: FTP Archive Access: Machine : sirius.itd.adelaide.edu.au [IP=129.127.40.3] Directory : ~pub/misc Files : lzrw1.tex - My conference paper describing the algorithm. lzrw1.c - An implementation in C. lzrw1.68000 - An implementation in 68000 assembly language. The LZRW1 algorithm gives not-good compression but is fast, requires only 16K of memory and is patent free. Suggested Improvements to LZRW1 ------------------------------- After publication of LZRW1, at least the following people noted that the 4-bit copy length range field was not being used to the full. Dean Long dlong@oak.ucsc.edu Kurt Haenen ghgaea8@cc1.kuleuven.ac.be (I received some other suggested improvements to the algorithm but most of them were ideas that I had tried and discarded during the algorithm's development (e.g. deep hash tables, different coding schemes). The following table shows that LZRW1 was wasting two elements of its four bit length space. A simple revision changed the range from [3,16] to [3,18]. Field Value : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -------------------------------------------------------------- LZRW1 : X X 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Revision : 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 The missing values were intentionally designed into the algorithm because of misguided concerns for time efficiency. The original version of the algorithm was in 68000 machine code and I was optimizing like crazy. At some stage I decided that I could save a subtraction instruction in the compressor and an addition instruction in the decompressor by shifting the range by only one (the decrement and jump instruction on the 68000 stops at -1, not 0). This design decision was reflected in the C version with which I wanted to be compatible (so as not to confuse the meaning of the name "LZRW1"). Things changed when Dean Long not only pointed out the deficiency (of which I was aware), but how removing it could actually speed up the code. When I had a look, I saw that he was right and that I had missed an optimization. I then decided to create a new algorithm containing the extended length range and the optimizations. The result is the LZRW1-A algorithm which I have implemented in C and 68000 machine code. LZRW1-A Implementations ----------------------- Implementations of LZRW1-A in C and in 68000 assembly language can be found in the following archive (files will appear within a day or two of 25-Jun-1991): FTP Archive Access: Machine : sirius.itd.adelaide.edu.au [IP=129.127.40.3] Directory : ~pub/misc Files : lzrw1-a.txt - Release notes on the file. lzrw1-a.c - An implementation in C. lzrw1-a.68000 - An implementation in 68000 assembly language. lzrw_headers.h - Useful header files. 68000 Assembly Language Version of LZRW1-A ------------------------------------------ The 68000 code for LZRW1-A was obtained by making the following changes to the 68000 code for LZRW1: Compressor: 1. The value of GROUPRAW was changed from 256 to 288. 2. Two extra COMPARE_BYTE macro calls were inserted in the list of calls. 3. Two extra DO_COPY macro calls were inserted in the list of calls. 4. A constant of two was subtracted from the second parameter of all calls to DO_COPY. Decompressor: 5. The code: COPYLOOP: move.b (A_T1)+, (A_DST)+ ;Perform the copy operation. dbra D_COPLEN, COPYLOOP was replaced by the more unrolled version: move.b (A_T1)+, (A_DST)+ ;Copy first byte of copy item. move.b (A_T1)+, (A_DST)+ ;Copy second byte of copy item. COPYLOOP: move.b (A_T1)+, (A_DST)+ ;Perform the copy operation. dbra D_COPLEN, COPYLOOP This last modification was Dean Long's idea, and made it efficient to use the shifted range. Thanks Dean --- your modification improved the decompression speed by about 10%. 6. Some minor cosmetic changes such as updating comments were made. When compared to the 68000 version of LZRW1, the 68000 version of LZRW1-A gives about 0.5% absolute better compression, compresses at the same speed, and decompresses 10% faster. The following table gives the results of applying the 68000 code to the corpus WITH A CONTEXT BLOCK SIZE OF 16K. These results correspond to Figure-8 of my conference paper on LZRW1. I did not bother with instruction counting this time. ---- +--------------------+---------------+---------+ | File K /16K | CmpLen %Rem | K/s | +--------------------+---------------+---------+ | bib 109 7 | 67600 60.8 | 43 162 | | book1 751 47 | 534086 69.5 | 40 148 | | book2 597 38 | 368621 60.3 | 45 157 | | geo 100 7 | 87618 85.6 | 31 146 | | news 368 24 | 236432 62.7 | 41 164 | | obj1 21 2 | 13240 61.6 | 41 173 | | obj2 241 16 | 127885 51.8 | 50 171 | | paper1 52 4 | 31520 59.3 | 45 159 | | paper2 80 6 | 51130 62.2 | 44 153 | | pic 501 32 | 126828 24.7 | 89 202 | | progc 39 3 | 21932 55.4 | 48 164 | | progl 70 5 | 31474 43.9 | 58 173 | | progp 48 4 | 21536 43.6 | 59 173 | | trans 91 6 | 44158 47.1 | 54 176 | +--------------------+---------------+---------+ | Average 219 14 | 56.3 | 49 166 | +--------------------+---------------+---------+ +--------------------+---------------+---------+ | zeros 78 5 | 9508 11.9 |144 219 | | noise 78 5 | 80030 100.0 | 23 1178 | +--------------------+---------------+---------+ This table gives the performance of a hand-optimized 68000 assembly language implementation of LZRW1-A running on an 8MHz, 24-bit bus, Macintosh-SE computer. The input files were divided into 16K blocks which were compressed independently. The %Rem column gives the compression as a percentage remaining.. The K/Sec column gives the compression and decompression speeds in kilobytes (1024) per second. Decompression speeds are given relative to OUTPUT (uncompressed) bytes. The speeds are for execution only and do not include IO. All results were rounded to the given accuracy. ---- The following table gives the performance of the 68000 version of LZRW1-A when applied to the corpus using context block lengths of 256K. The longer blocks improves speed and compression a little. Details: The run is on a Mac-SE (8MHz 68000 with 24-bit bus). The average values are the average of exactly the values printed in the column above. The times are for compression only and do not include any IO. BENCHMARKERS: THIS TABLE HOLDS THE "OFFICIAL" RESULTS FOR LZRW1-A. +----------------------------------------------------------------+ | DATA COMPRESSION TEST | | ===================== | | Time of run : Tue 25-Jun-1991 08:21PM | | Timing accuracy : One part in 100 | | Context length : 262144 bytes (= 256.0000K) | | Test suite : Calgary Corpus Suite | | Files in suite : 14 | | Algorithm : LZRW1-A (68000 assembly language) | +----------------------------------------------------------------+ | File Name Length CxB ComLen %Remn Bits Com K/s Dec K/s | | ---------- ------ --- ------ ----- ---- ------- ------- | | rpus:Bib.D 111261 1 65801 59.1 4.73 44.35 162.17 | | us:Book1.D 768771 3 522797 68.0 5.44 40.58 147.21 | | us:Book2.D 610856 3 359715 58.9 4.71 45.74 156.85 | | rpus:Geo.D 102400 1 86432 84.4 6.75 31.91 144.93 | | pus:News.D 377109 2 230298 61.1 4.89 42.57 163.84 | | pus:Obj1.D 21504 1 13189 61.3 4.91 41.14 173.39 | | pus:Obj2.D 246814 1 124666 50.5 4.04 50.61 171.89 | | s:Paper1.D 53161 1 30626 57.6 4.61 46.35 158.72 | | s:Paper2.D 82199 1 50048 60.9 4.87 44.84 152.90 | | rpus:Pic.D 513216 2 125718 24.5 1.96 89.43 202.04 | | us:Progc.D 39611 1 21494 54.3 4.34 48.52 164.22 | | us:Progl.D 71646 1 30634 42.8 3.42 59.97 173.47 | | us:Progp.D 49379 1 20935 42.4 3.39 60.49 173.60 | | us:Trans.D 93695 1 42484 45.3 3.63 55.36 177.09 | +----------------------------------------------------------------+ | Average 224401 1 123202 55.1 4.41 50.13 165.88 | +----------------------------------------------------------------+ C Version of LZRW1-A -------------------- The C version of LZRW1-A was obtained by making the following changes to the C version of LZRW1: Compressor: 1. The overrun specification in the header comments was changed from 256 (=16x16) to 288 (=16x18). 2. The value of ITEMMAX was changed from 16 to 18. 3. The unrolled matching loop was unrolled a bit more. Two more "PS ||" were appended to the list of them. 4. The term (len-1) was replaced by (len-3) in the line "*p_dst...". Decompressor: 5. "len=1+.." was changed to "len=3+.." At this stage I had a working version of LZRW1-A whose code looked very like the code for LZRW1. However, I continued and: 6. Performed extensive optimizations on both the compressor and decompressor. The same code structure is present, but a lot has changed. The C version runs about two to three times more slowly than the assembler version. The results are given below. The output lengths vary by a few bytes from the 256K run of the 68000 version; this is almost certainly a result of the non-determinism of the algorithm. Details: The run is on a Mac-SE (8MHz 68000 with 24-bit bus). The average values are the average of exactly the values printed in the column above. The times are for compression only and do not include any IO. The compiler used was THINK C V4.0. +----------------------------------------------------------------+ | DATA COMPRESSION TEST | | ===================== | | Time of run : Tue 25-Jun-1991 08:59PM | | Timing accuracy : One part in 100 | | Context length : 262144 bytes (= 256.0000K) | | Test suite : Calgary Corpus Suite | | Files in suite : 14 | | Algorithm : LZRW1-A (C version) | +----------------------------------------------------------------+ | File Name Length CxB ComLen %Remn Bits Com K/s Dec K/s | | ---------- ------ --- ------ ----- ---- ------- ------- | | rpus:Bib.D 111261 1 65813 59.2 4.73 14.95 67.67 | | us:Book1.D 768771 3 522800 68.0 5.44 13.68 62.96 | | us:Book2.D 610856 3 359715 58.9 4.71 15.40 66.59 | | rpus:Geo.D 102400 1 86432 84.4 6.75 10.81 59.55 | | pus:News.D 377109 2 230298 61.1 4.89 14.33 67.52 | | pus:Obj1.D 21504 1 13189 61.3 4.91 13.70 69.36 | | pus:Obj2.D 246814 1 124666 50.5 4.04 17.07 71.36 | | s:Paper1.D 53161 1 30626 57.6 4.61 15.65 67.09 | | s:Paper2.D 82199 1 50048 60.9 4.87 15.12 65.34 | | rpus:Pic.D 513216 2 125718 24.5 1.96 30.01 84.19 | | us:Progc.D 39611 1 21494 54.3 4.34 16.29 69.28 | | us:Progl.D 71646 1 30634 42.8 3.42 20.18 73.43 | | us:Progp.D 49379 1 20944 42.4 3.39 20.23 73.56 | | us:Trans.D 93695 1 42501 45.4 3.63 18.61 73.53 | +----------------------------------------------------------------+ | Average 224401 1 123205 55.1 4.41 16.86 69.39 | +----------------------------------------------------------------+ Nomenclature ------------ I am keen to retain a sharp definitions for the algorithms that I produce. The term "LZRW1" should be used only to refer to the original algorithm published in my data compression conference paper. This term does not refer to a generic family of algorithms but rather the specific algorithm described in the paper. The term "LZRNW1" I have used accidentally in the past in referring to "LZRW1". It should never be used but if you see it used, it means the same as "LZRW1". The term "LZRW1-A" refers to the particular algorithm described above which is based on the "LZRW1" algorithm. ----