IMPORTANT NOTE: The LZRW5 document was posted on 23-Jul-1991. That document appears at the end of this file. At the time I thought LZRW5 was a new idea. However, I was wrong as is shown by the following email which I have included here before the original document. ====================================================================== Path: spam!ross From: ross@spam.ua.oz.au (Ross Williams) Newsgroups: comp.compression Subject: LZRW5 is NOT original. Summary: LZRW5 is NOT original. Keywords: lzrw5 data compression algorithm orginality yabba yabbawhap Message-ID: <967@spam.ua.oz> Date: 28 Jul 91 16:21:46 GMT Sender: ross@spam.ua.oz Followup-To: comp.compression Distribution: world Organization: Statistics, Pure & Applied Mathematics, University of Adelaide Lines: 26 LZRW5 is NOT Original ===================== Earlier I posted my LZRW5 as an original algorithm. However, I have recently discovered that it has appeared before. The same idea of keeping an array of pointers to phrases in an LZW algorithm was used by Dan Bernstein in his yabbawhap compression package which can be found in comp.sources.unix vol 24. At least one site where you can get this is UUNET (in Virginia) in uunet.uu.net:comp.sources.unix/volume24/yabbawhap. Dan developed the yabba program about one year ago and posted it to alt.comp.compression in March 1991. Boo hoo, Ross Williams ross@spam.ua.oz.au PS: How embarrassing - I actually created alt.comp.compression! Dan posted yabba after I thought I had destroyed alt.comp.compression and while I was busy with creating comp.compression! I nearly looked at yabba recently too, but decided to spend the time getting LZRW4 and LZRW5 out! I have not yet sighted yabba source code, being busy at present with non-compression stuff. ====================================================================== Path: spam!sirius.ucs.adelaide.edu.au!yoyo.aarnet.edu.au!munnari.oz.au!samsung!uakari.primate.wisc.edu!sdd.hp.com!wupost!emory!ox.com!yale.edu!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.compression Subject: Re: LZRW5: A Turbocharged LZW-like Decompressor. Message-ID: <28577.Jul2519.12.1791@kramden.acf.nyu.edu> Date: 25 Jul 91 19:12:17 GMT References: <961@spam.ua.oz> Organization: IR Lines: 22 While I'm sure we all appreciate Ross's extensive contributions to the field of compression, I have to point out that his ``turbocharging'' is exactly the method stated in my introduction to Y coding (draft 4b, 3/19/91), in section 2 (``Implementing LZW''): : --- Decoding : : LZW decoding is even easier than encoding: the dictionary does not have : to support searching. The easiest (and generally fastest) method is to : keep I in memory as it is reconstructed, and to keep track of which : substring of I a given dictionary number corresponds to. To add pc to : the dictionary means to add the pair (pointer to start of pc within I, : length of pc) at the right spot in an array indexed by dictionary : numbers. There are methods which take slightly less memory than this, : but they are slower. I applied this to LZW last year, and indeed it makes a very fast, simple decompressor. My ``unwhap'' decompressor for AP, as included in my yabbawhap package, comp.sources.unix volume 24, also applies the same technique. ---Dan ====================================================================== Path: spam!sirius.ucs.adelaide.edu.au!yoyo.aarnet.edu.au!munnari.oz.au!samsung!mips!swrinde!zaphod.mps.ohio-state.edu!think.com!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.compression Subject: Re: LZRW5 is NOT original. Message-ID: <18214.Jul2902.59.2991@kramden.acf.nyu.edu> Date: 29 Jul 91 02:59:29 GMT References: <967@spam.ua.oz> Organization: IR Lines: 42 In article <967@spam.ua.oz> ross@spam.ua.oz.au (Ross Williams) writes: > Earlier I posted my LZRW5 as an original algorithm. However, I have > recently discovered that it has appeared before. The same idea of > keeping an array of pointers to phrases in an LZW algorithm was used > by Dan Bernstein in his yabbawhap compression package which can be > found in comp.sources.unix vol 24. I used it in unwhap, and it's the main reason unwhap runs as fast as uncompress---AP does three to five times more dictionary manipulations than LZW. However, I haven't figured out how to apply it to Y coding. (I have a very promising lead, though, so don't be surprised if and when undabba is faster than unyabba... more news soon...) > Dan developed the yabba program about one year ago and posted it to > alt.comp.compression in March 1991. Actually, I posted my introduction to Y coding to alt.comp.compression on March 6, 1991, a few days before the group was officially removed. I included a slightly revised draft with yabbawhap, which I posted to comp.sources.unix, not alt.comp.compression. I discovered Y coding on December 26, 1990. A year ago (July 28, 1990, according to my mail records) I discovered AP coding, which I called ``B coding'' in private discussions until a month later, when Brad Templeton pointed out Storer's prior discovery of the method. To get back to the topic at hand, my implementations of AP during August 1990 all used the same technique as in Ross's recent LZRW5. I don't know if I was the first to come upon this idea. While I'm on a historical accuracy kick: Don't trust anything you read in the May 1991 Computer Language article on data compression. I quote: ``One of the most popular data-compression algorithms is the sliding-window dictionary (SWD) variation on the Lempel-Ziv-Welch (LZW) algorithm, developed around 1984-85.'' First of all, LZW was developed in 1983 (both by Welch and by Miller-Wegman), and published in 1984, so ``developed around 1984-85'' is silly. Second, I haven't gone through the code in that article in detail, but there is no such thing as ``the sliding-window dictionary variation on LZW.'' The article appears to describe one of the LZ77 variants. The article goes on to use ``SWD'' as the name of the method; this is nonstandard at best (though one can say the same of almost all of Storer's terminology). ---Dan ====================================================================== ---- LZRW5: A TURBOCHARGED LZW-LIKE DECOMPRESSOR =========================================== Author : Ross Williams Date : 23-Jul-1991. This is a public domain document and may be copied freely so long as it is not modified (but text formatting transformations are permitted though). Abstract -------- A technique is described for constructing an LZW-like algorithm that yields the same compression as LZW but whose decompressor runs much faster (93K/s for a 68000 assembler implementation on an 8MHz Mac-SE20). The implementation of the compression algorithm is open ended (although a modified version of LZW could be used). Unfortunately it is possible that the technique may be covered by third party patents. Distribution Algorithms ----------------------- Over the last month or so, one or two people have posted requests for what I call "distribution algorithms", which are data compression algorithms where the decompressor's speed and memory consumption is much more important than the compressor's. Such algorithms are useful when distributing data, where the data will be compressed once by the distributor, but decompressed thousands or perhaps millions of times by users. The distributor can afford to use a hugely powerful computer to perform the compression, but the decompression must be performed on each user computer using minimum resources. While thinking about this, I happened to apply the phrase table idea of my LZRW2 algorithm to the LZW algorithm to produce an LZW-like algorithm that provides the same compression performance as LZW, but yields very much faster decompression, possibly at a cost in memory consumption and speed of the compressor. Description of the LZRW5 Idea ------------------------------ The trick is for the compressor and decompressor to maintain a "phrase table" of roughly the most recent 2^n phrases parsed (where n is the codeword width). Apart from a large memory output buffer, this is the ONLY data structure of the decompressor. The compressor is likely to need other data structures. Each phrase table entry consists of a pointer to the phrase and the length of the phrase. An index rotates through this table indicating the next position to overwrite. Pointers point to the input buffer in the compressor and the output buffer in the decompressor. To process each phrase, the decompressor receives a phrase number from the compressor. It then looks up the pointer and the length of the phrase and copies the phrase to the output. It then overwrites the next position in the phrase table with a pointer to the phrase just coded, but puts in the length+1 in the table. The compressor does all this too but keeps an additional table that enables it to select the longest match from the strings in the phrase table. Here is a rough outline for the decompressor code. A complete 68000 C and assembly language version appears in Appendix A. unsigned cycle; -- Index of next slot to write. unsigned word *p_src; -- Pointer to input block. unsigned byte *p_dst; -- Pointer to output block. char *pointer[65536]; -- Stores pointers to phrases. unsigned *length[65536]; -- Stores length of phrases. p_src=pointer to start of input block. p_dst=pointer to start of output block. cycle=256; forall ch in 0..255, pointer[ch]=&; forall ch in 0..255, length[ch]=1; while (p_src!=end of input block) loop unsigned phrasenum,len; phrasenum=*p_src++; -- Pick up next code word. p=pointer[phrasenum]; -- Get pointer and length of phrase to copy len=length[phrasenum]; pointer[cycle]=p_dst; -- Add a new phrase one byte longer. length[cycle]=len+1; cycle++; -- Update cycle counter. if (cycle==2^n) cycle=256;-- Note:0..255 reserved for alphabet. while (len--) *p_dst++=*p++; -- Copy phrase to output. end loop This decompressor code has the full power of LZW while giving roughly the speed of the fast LZ77 decompressors (e.g. the A1 or LZRW1 algorithms). For 65536 phrases, it will use 320K, plus space for the output buffer. The algorithm is adaptive so reducing the table size will probably not impact on compression grossly (you may or may not want to pack the resultant narrower codewords). This decompressor avoids the nasty problem LZW has with adding a phrase and then using it immediately (and having to look up the first character etc). The problem is solved in a LZ77 kind of a way by using pointers to the output to do automatic overlapping! This decompressor is also adaptive because it overwrites old phrases with new ones. Unix compress has to do its adaption explicitly. There are any number of compressor implementations that could operate in conjunction with the decompressor above. The compressor can use expensive data structures to find the best match. It would also be a very easy and efficient thing to modify the decompressor to (say) add TWO strings to the dictionary instead of one (e.g. strings of length 1 and 3 characters longer, or this and the next phrase) (many other similar variations are possible). There is no reason why a conventional LZW hash table compressor could not be modified to drive the output through a phrase table. This would yield roughly the same compression speed. The code below is an early draft of the LZRW5 algorithm. The compressor uses binary trees and is rather pathetic (it runs at about 1k/s) as I have been concentrating on the deecompressor. The 68000 assembly language version of the decompressor runs at 93K/sec on my Mac-SE20 (43K/sec in C). This (the 93K/sec) is only 2/3 the speed of LZRW1 (which yields worse compression), but a lot faster than Unix compress. The compressor below runs at about 1K/second. So far I have been unable to ascertain whether this idea is covered by any patents. If anyone can help here, I would be most grateful. Enjoy, Ross Williams ross@spam.ua.oz.au Appendix A: Half-Baked Code for LZRW5 ------------------------------------- The following code is public domain and is provided "as is". As most experimental code, it is poorly designed, structured, coded, and commented. WARNING: It is possible that this code may be subject to patent protection by third parties. As such, it should not be included in production software or executed for any but experimental purposes without prior legal advice. WARNING: This code will not work properly for more than 65536 phrases. You can be sure of not reaching this limit by only feeding in input files of at most length 65536. The program does not contain an assertion for this condition. NOTE: This code is supplied more for exposition than execution. /******************************************************************************/ /* */ /* LZRW5.C */ /* */ /******************************************************************************/ /* */ /* Author : Ross Williams. */ /* Date : 17-Jul-1991. */ /* Release : 1. */ /* */ /******************************************************************************/ /* INCLUDE FILES */ /* ============= */ #include "port.h" /* Defines symbols for the non portable stuff. */ #include "compress.h" /* Defines single exported function "compress". */ #include "fast_copy.h" /* Fast memory copy routine. */ #include #include #include "assert.h" /******************************************************************************/ /* When I first started to get concerned about the portability of my C code, */ /* I switched over to using only macro defined types UBYTE, UWORD, ULONG and */ /* one or two others. While, these are useful for most purposes, they impair */ /* efficiency as, if I have a variable whose range will be [0,1000], I will */ /* declare it as a UWORD. This will translate into (say) "short int" and */ /* hence may be less efficient than just an "int" which represents the */ /* natural size of the machine. Before releasing LZRW3-A, I realized this */ /* mistake. Unfortunately, I can't access the ftp archive with my portability */ /* header in it in time for this algorithm's release and so I am including an */ /* extra definition. The definition UCARD stands for an unsigned (cardinal) */ /* type that can hold values in the range [0,32767]. This is within the ANSI */ /* range of a standard int or unsigned. No assumption about overflow of this */ /* type is made in the code (i.e. all usages are within range and I do not */ /* use the value -1 to detect the end of loops.). */ /* You can use either "unsigned" or just "int" here depending on which is */ /* more efficient in your environment (both the same probably). */ #define UCARD unsigned struct node_t { UBYTE *p_phrase; /* 4 bytes. Pointer to phrase represented by the node. */ UBYTE len_phrase; /* 1 byte. Length of the phrase. */ /* UBYTE same; 1 byte. Length of prefix shared by parent node. */ /* UWORD parent; 2 bytes. Phrase number of parent phrase. */ UWORD left; /* 2 bytes. Phrase number of left child phrase. */ UWORD right; /* 2 bytes. Phrase number of right child phrase. */ /* -------- */ /* 12 bytes is the ideal (no padding) structure length. */ }; #define MAX_NODES (1L<<16) #define MAX_PHRASE_LENGTH (255L) #define ALPHABET_SIZE (256L) #define TEST_ARRAY_ITEMS 4 typedef struct node_t test_node_array[TEST_ARRAY_ITEMS]; #define U(X) ((ULONG) X) #define SIZE_P_BYTE (U(sizeof(UBYTE *))) #define ALIGNMENT_FUDGE (U(64L)) /* Calculate the memory required by the compressor. */ #define MEM_NODE_TABLE (MAX_NODES*(sizeof(test_node_array)/TEST_ARRAY_ITEMS)) #define MEM_ALPHA_TABLE (256L) #define MEM_COMPRESS (MEM_NODE_TABLE + \ MEM_ALPHA_TABLE + \ ALIGNMENT_FUDGE) /* Calculate the memory required by the decompressor. */ #define MEM_PHRASE_TABLE (MAX_NODES*SIZE_P_BYTE) #define MEM_LENGTH_TABLE (MAX_NODES*1L) #define MEM_DECOMPRESS (MEM_PHRASE_TABLE + \ MEM_LENGTH_TABLE + \ MEM_ALPHA_TABLE + \ ALIGNMENT_FUDGE) #define MEM_REQ (MEM_COMPRESS>MEM_DECOMPRESS?MEM_COMPRESS:MEM_DECOMPRESS) static struct compress_identity identity = { U(0x4A619944), /* Algorithm identification number. */ MEM_REQ, /* Working memory (bytes) required. */ "LZRW5", /* Name of algorithm. */ "1.0", /* Version number of algorithm. */ "17-Jul-1990", /* Date of algorithm. */ "Public Domain", /* Copyright notice. */ "Ross N. Williams", /* Author of algorithm. */ "Renaissance Software", /* Affiliation of author. */ "Public Domain" /* Vendor of algorithm. */ }; LOCAL int compare_strings(UBYTE *,UCARD,UBYTE *,UCARD); LOCAL void compress_compress (UBYTE *,UBYTE *,ULONG,UBYTE *,ULONG *); LOCAL void compress_decompress(UBYTE *,UBYTE *,ULONG,UBYTE *,ULONG *); /******************************************************************************/ /* This function is the only function exported by this module. */ /* Depending on its first parameter, the function can be requested to */ /* compress a block of memory, decompress a block of memory, or to identify */ /* itself. For more information, see the specification file "compress.h". */ EXPORT void compress(action,wrk_mem,src_adr,src_len,dst_adr,p_dst_len) UWORD action; /* Action to be performed. */ UBYTE *wrk_mem; /* Address of working memory we can use. */ UBYTE *src_adr; /* Address of input data. */ ULONG src_len; /* Length of input data. */ UBYTE *dst_adr; /* Address to put output data. */ ULONG *p_dst_len; /* Address of longword for length of output data. */ { switch (action) { case COMPRESS_ACTION_IDENTITY: *p_dst_len=(ULONG) &identity; break; case COMPRESS_ACTION_COMPRESS: compress_compress(wrk_mem,src_adr,src_len,dst_adr,p_dst_len); break; case COMPRESS_ACTION_DECOMPRESS: compress_decompress(wrk_mem,src_adr,src_len,dst_adr,p_dst_len); break; } } /******************************************************************************/ /* The following #define defines the length of the copy flag that appears at */ /* the start of the compressed file. The value of four bytes was chosen */ /* because the fast_copy routine on my Macintosh runs faster if the source */ /* and destination blocks are relatively longword aligned. */ /* The actual flag data appears in the first byte. The rest are zeroed so as */ /* to normalize the compressed representation (i.e. not non-deterministic). */ #define FLAG_BYTES 4 /* The following #defines define the meaning of the values of the copy */ /* flag at the start of the compressed file. */ #define FLAG_COMPRESS 0 /* Signals that output was result of compression. */ #define FLAG_COPY 1 /* Signals that output was simply copied over. */ /* The 68000 microprocessor (on which this algorithm was originally developed */ /* is fussy about non-aligned arrays of words. To avoid these problems the */ /* following macro can be used to "waste" from 0 to 3 bytes so as to align */ /* the argument pointer. */ #define ULONG_ALIGN_UP(X) ((((ULONG)X)+3)&~3) /******************************************************************************/ LOCAL int compare_strings(p_s1,len1,p_s2,len2) /* Compares the two strings lexicographically and returns: */ /* -1 if s1s2 */ register UBYTE *p_s1,*p_s2; /* Pointers to first byte of each string. */ UCARD len1,len2; /* Number of bytes in each string. */ { register UCARD minlen = len1len2) return 1; return 0; } LOCAL void compress_compress (p_wrk_mem,p_src_first,src_len,p_dst_first,p_dst_len) /* Input : Hand over the required amount of working memory in p_wrk_mem. */ /* Input : Specify input block using p_src_first and src_len. */ /* Input : Point p_dst_first to the start of the output zone (OZ). */ /* Input : Point p_dst_len to a ULONG to receive the output length. */ /* Input : Input block and output zone must not overlap. */ /* Output : Length of output block written to *p_dst_len. */ /* Output : Output block in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. May */ /* Output : write in OZ=Mem[p_dst_first..p_dst_first+src_len+MAX_CMP_GROUP-1].*/ /* Output : Upon completion guaranteed *p_dst_len<=src_len+FLAG_BYTES. */ UBYTE *p_wrk_mem; UBYTE *p_src_first; ULONG src_len; UBYTE *p_dst_first; ULONG *p_dst_len; { /* p_src and p_dst step through the source and destination blocks. */ UBYTE *p_src = p_src_first; UBYTE *p_dst = p_dst_first; /* The following variables are never modified and are used in the */ /* calculations that determine when the main loop terminates. */ UBYTE *p_src_post = p_src_first+src_len; UBYTE *p_dst_post = p_dst_first+src_len; UBYTE *p_src_max_256 = p_src_first+src_len-256; struct node_t *node; UBYTE **pointer; UBYTE *length; UBYTE *alphabet; ULONG next; UCARD i; node = (struct node_t *) ULONG_ALIGN_UP(p_wrk_mem); alphabet = (UBYTE *) &node[MAX_NODES]; /* Load the node table with nodes for the alphabet. */ for (i=0;ip_dst_post) goto overrun; p_ziv=p_src; /* Run down the tree matching against the Ziv. */ current=*p_ziv; bestnode=current; bestlen=1; while (TRUE) { UWORD t; struct node_t *pn=&node[current]; cmp=compare_strings(p_ziv,bestlen+1,pn->p_phrase,pn->len_phrase); if (cmp==-1) {if ((t=pn->left)==0) break; else current=t;} else if (cmp==1) {if ((t=pn->right)==0) break; else current=t;} else {bestnode=current; if (++bestlen==MAX_PHRASE_LENGTH) break;} } /* Assert: bestnode is the longest match with the Ziv. */ /* Assert: current is "append" leaf for the Ziv. Direction in cmp. */ /* Check that node contains a string that matches Ziv */ /* DEBUG for (i=0;i>8; /* Big endian order for 68000. */ *p_dst++=bestnode&0xFF; /* Delete the oldest phrase from the tree. */ /* <<>> */ /* Note: must ignore nodes with a length of zero. */ /* Insert a new phrase into the tree. */ if (node[current].len_phrase==MAX_PHRASE_LENGTH) node[next].len_phrase=0; else { /* Create a new leaf node with the string in it. */ struct node_t *pn=&node[next]; pn->p_phrase=p_ziv; pn->len_phrase=node[bestnode].len_phrase+1; /* Opt: Use a wrap for this later! */ pn->left=0; pn->right=0; /* Connect it up to the append node. */ if (cmp==-1) node[current].left=next; else node[current].right=next; } if (++next == MAX_NODES) {printf("Ran out of nodes.\n");exit(1);} } /* End main processing loop. */ /* At this point there are 256 or less bytes to go. These are most easily */ /* coded as one byte strings. */ while (p_src!=p_src_post) { *p_dst++=0; *p_dst++=*p_src++; } /* Write the length of the output block to the dst_len parameter and return. */ *p_dst_len=p_dst-p_dst_first; return; /* Jump here as soon as an overrun is detected. An overrun is defined to */ /* have occurred if p_dst>p_dst_first+src_len. That is, the moment the */ /* length of the output written so far exceeds the length of the input block.*/ /* The algorithm checks for overruns at least at the end of each group */ /* which means that the maximum overrun is MAX_CMP_GROUP bytes. */ /* Once an overrun occurs, the only thing to do is to set the copy flag and */ /* copy the input over. */ overrun: *p_dst_first=FLAG_COPY; fast_copy(p_src_first,p_dst_first+FLAG_BYTES,src_len); *p_dst_len=src_len+FLAG_BYTES; } /******************************************************************************/ LOCAL void compress_decompress (p_wrk_mem,p_src_first,src_len,p_dst_first,p_dst_len) /* Input : Hand over the required amount of working memory in p_wrk_mem. */ /* Input : Specify input block using p_src_first and src_len. */ /* Input : Point p_dst_first to the start of the output zone. */ /* Input : Point p_dst_len to a ULONG to receive the output length. */ /* Input : Input block and output zone must not overlap. User knows */ /* Input : upperbound on output block length from earlier compression. */ /* Input : In any case, maximum expansion possible is nine times. */ /* Output : Length of output block written to *p_dst_len. */ /* Output : Output block in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. */ /* Output : Writes only in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. */ UBYTE *p_wrk_mem; UBYTE *p_src_first; ULONG src_len; UBYTE *p_dst_first; ULONG *p_dst_len; { UCARD execute_c_code = FALSE; /* Byte pointers p_src and p_dst scan through the input and output blocks. */ /* FOLLOWING TWO WERE REGISTER FOR C CODE. */ UBYTE *p_src = p_src_first+FLAG_BYTES; UBYTE *p_dst = p_dst_first; /* The following variable is never modified and are used to control */ /* the main loop. */ UBYTE *p_src_post = p_src_first+src_len; UBYTE **pointer; UBYTE *length; UBYTE *alphabet; UBYTE **p_pointer,**p_post_pointer; UBYTE *p_length; /* Check the leading copy flag to see if the compressor chose to use a copy */ /* operation instead of a compression operation. If a copy operation was */ /* used, then all we need to do is copy the data over, set the output length */ /* and return. */ if (*p_src_first==FLAG_COPY) { fast_copy(p_src_first+FLAG_BYTES,p_dst_first,src_len-FLAG_BYTES); *p_dst_len=src_len-FLAG_BYTES; return; } pointer = (UBYTE **) ULONG_ALIGN_UP(p_wrk_mem); alphabet = (UBYTE *) &pointer[MAX_NODES]; length = (UBYTE *) &alphabet[ALPHABET_SIZE]; ass( ((p_src_post-p_src)&1)==0, "lzrw5_decompress: Input block is of odd length."); ass( (((ULONG)p_src)&1)==0, "lzrw5_decompress: Input block is odd aligned."); printf("DECOMPRESSOR CALLED\n"); { UCARD i; UBYTE **p_init_pointer = pointer; UBYTE *p_init_alphabet = alphabet; UBYTE *p_init_length = length; UBYTE length_init_value = execute_c_code ? 0 : 255; for (i=0;i>1; ULONG entries_to_go = p_post_pointer-p_pointer; ULONG unroll; /* WAS REGISTER FOR C CODE. */ if (entries_to_go==0) { p_pointer=&pointer[ALPHABET_SIZE]; p_length =&length [ALPHABET_SIZE]; entries_to_go=p_post_pointer-p_pointer; } unroll= (items_to_go0,"unroll==0."); if (execute_c_code) while (unroll--) { register UBYTE *p; register UCARD len; register UWORD phrase; phrase =*p_src++<<8; phrase|=*p_src++; p=pointer[phrase]; len=length[phrase]; *p_pointer++=p_dst; *p_length++=len+1; do {*p_dst++=*p++;} while (len--); } else asm 68000 { ;------------------------------------------------------------------------ ;REGISTER MAP ;============ #define A_SRC a0 ;Points to the next input byte. #define A_DST a1 ;Points to the next output byte. #define A_POINTER a2 ;Points to the start of the pointer array. #define A_LENGTH a3 ;Points to the start of the length array. #define A_PPOINTER a4 ;Points to next pointer to be overwritten. #define A_PLENGTH a5 ;Points to next length to be overwritten. #define A_P a6 ;Runs through source phrase during copy. #define A_LINKAGE a6 ;<--a6 is also the linkage register. Careful!!! #define A_STACK a7 ;Stack pointer. Don't touch! #define D_UNROLL d0 ;Counts down the unrolled loop #define D_LEN d1 ;Holds the length of the copy. #define D_PHRASE d2 ;Number of phrase chosen by sender. #define D_D3 d3 ;Unused. #define D_D4 d4 ;Unused #define D_D5 d5 ;Unused. #define D_D6 d6 ;Unused. #define D_D7 d7 ;Unused. ;Lightspeed C doesn't mind us using a0-a1 and d0-d2, but I'm being ;careful here (if only to ease porting to other compilers). #define REGISTERS_TO_SAVE a0-a6/d0-d7 ;------------------------------------------------------------------------ movem.l REGISTERS_TO_SAVE,-(A_STACK) move.l p_src, A_SRC ;These need to be saved at the end. move.l p_dst, A_DST move.l p_pointer,A_PPOINTER move.l p_length, A_PLENGTH move.l pointer, A_POINTER ;These ones don't need to be saved move.l length, A_LENGTH ;at the end. move.l unroll, D_UNROLL move.l A_LINKAGE,-(A_STACK) ;Note: The clr.l D_PHRASE instruction can be moved out of the loop ; if MAX_NODES is not more than 2^15. ;Note: The clr.l D_LEN could be moved out of the loop if we ; were to store the length array as an array of longwords. #define DECOMPRESS_PHRASE(LABEL) \ clr.l D_PHRASE ;Clear out top bytes. \ clr.l D_LEN ;Clear byte 1 \ move.w (A_SRC)+,D_PHRASE ;phrase =*p_src++<<8; \ ;phrase|=*p_src++; \ move.b (0,A_LENGTH,D_PHRASE.L),D_LEN;len=length[phrase]; \ lsl.l #2,D_PHRASE ;p=pointer[phrase]; \ move.l (0,A_POINTER,D_PHRASE.L),A_P \ move.l A_DST,(A_PPOINTER)+ ;*p_pointer++=p_dst; \ add.b #1, D_LEN ;*p_length++=len+1; \ move.b D_LEN,(A_PLENGTH)+ \ LABEL: ;do \ move.b (A_P)+,(A_DST)+ ; {*p_dst++=*p++;} \ dbra D_LEN, LABEL ;while (len--); LAST LINE @unrolled_phrase_loop: sub.l #16, D_UNROLL bmi @end_unrolled_phrase_loop DECOMPRESS_PHRASE(@copy_01) DECOMPRESS_PHRASE(@copy_02) DECOMPRESS_PHRASE(@copy_03) DECOMPRESS_PHRASE(@copy_04) DECOMPRESS_PHRASE(@copy_05) DECOMPRESS_PHRASE(@copy_06) DECOMPRESS_PHRASE(@copy_07) DECOMPRESS_PHRASE(@copy_08) DECOMPRESS_PHRASE(@copy_09) DECOMPRESS_PHRASE(@copy_10) DECOMPRESS_PHRASE(@copy_11) DECOMPRESS_PHRASE(@copy_12) DECOMPRESS_PHRASE(@copy_13) DECOMPRESS_PHRASE(@copy_14) DECOMPRESS_PHRASE(@copy_15) DECOMPRESS_PHRASE(@copy_16) bra @unrolled_phrase_loop @end_unrolled_phrase_loop add.l #16, D_UNROLL @rolled_phrase_loop: sub.l #1, D_UNROLL bmi @end_rolled_phrase_loop DECOMPRESS_PHRASE(@just_the_one) bra @rolled_phrase_loop @end_rolled_phrase_loop: move.l (A_STACK)+, A_LINKAGE move.l A_SRC, p_src move.l A_DST, p_dst move.l A_PPOINTER, p_pointer move.l A_PLENGTH, p_length movem.l (A_STACK)+,REGISTERS_TO_SAVE ;------------------------------------------------------------------------ } /* End asm 68000 */ } /* End outer while loop. */ /* Write the length of the decompressed data before returning. */ *p_dst_len=p_dst-p_dst_first; printf("EXITING DECOMPRESSOR.\n"); } /******************************************************************************/ /* End of LZRW5.C */ /******************************************************************************/ ----