OT: Help needed w/ optimizing lz77 compressor \ VOGONS

OT: Help needed w/ optimizing lz77 compressor

Topic actions

First post, by Harry Potter

Posted on 2022-12-02, 15:50

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

Hi! I'm working on several compression techniques for several systems and am having a performance issue: the lz77 scan code is too slow. 🙁 On 8-bit computers, it is written in assembler. A 16-bit version uses some assembler, and I need to convert it all to assembler. I applied a suggestion to optimize the 8-bit version's inner loop, and it worked but only slightly. I was told about using hash-tables. I don't fully understand it. 🙁 Can somebody here explain it better for me? I have another way to optimize lz77: use an 8k array of bits, where each bit specifies whether an associated word was found or not. It helped slightly. How else can I optimize an lz77 technique?

BTW, I have a text compression for 8-bit systems called printtok. It allows you to compress strings using tokens and RLE of spaces. It helps a text adventure I'm creating's text by about 25%. Should I convert it to DOS?

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 1 of 14, by Ringding

Posted on 2022-12-02, 16:12

Ringding Offline

Rank Member

Rank: Member
Posts: 221
Joined: 2016-01-05, 21:02
Location: Wien

How slow is too slow? How does your code compare to gzip, speed-wise?

Reply 2 of 14, by Harry Potter

Posted on 2022-12-03, 00:20

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

Very. An 880k file takes a few minutes to compress on a semi-modern Win11/64 laptop.

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 3 of 14, by GemCookie

Posted on 2022-12-03, 11:15

GemCookie Offline

Rank Member

Rank: Member
Posts: 185
Joined: 2022-09-25, 16:25
Location: Aboard the PCI Express

Define "semi-modern"; this could mean anything from an Athlon 64 X2 to a Coffee Lake i7.

Gigabyte GA-8I915P Duo Pro | P4 530J | GTX 750 Ti | 2GiB | 120G HDD | 2k/Vista/10/FBSD
MSI MS-5169 | K6-2/350 | TNT2 M64 | 384MiB | 120G HDD | DR-/MS-DOS/NT/2k/XP/OBSD
Dell Precision M6400 | C2D T9600 | FX 2700M | 16GiB | 128G SSD | 2k/XP/NBSD

Reply 4 of 14, by Ringding

Posted on 2022-12-03, 12:17

Ringding Offline

Rank Member

Rank: Member
Posts: 221
Joined: 2016-01-05, 21:02
Location: Wien

What’s the purpose of this nit-picking? Every desktop CPU that runs Win11 is likely in the same performance range, by an order of magnitude.

A few minutes for less than 1MB is so slow that I agree it is embarrassingly slow 😉. I just tried gzip on a 4MB file, and it took 0.1 seconds. I would have a look at how it or other open-source compressors (zlib, lzop, zstd, lzip, ...) handle their window search.

Reply 5 of 14, by Harry Potter

Posted on 2022-12-03, 12:44

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

I attached the code that compresses using lz77. It is for Digital Mars C and Win32. I believe my laptop is less than four years old. BTW, part of the slowdown I believe is due to my printing the current position in the file being compressed.

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 6 of 14, by bakemono

Posted on 2022-12-03, 16:25

bakemono Offline

Rank Oldbie

Rank: Oldbie
Posts: 885
Joined: 2018-01-15, 06:56

How big is your search space for pattern matching? IIRC things like PKZip only search a 32KB window for duplicate strings (since Deflate compression includes something similar to lz77 in it).

I believe the 'hash tables' refers to an algo where you do a single pass of the data, calculating a rolling checksum of strings of fixed length (eg. 4 bytes) and build a table of offsets where each table entry corresponds to a checksum/hash and contains the offset where the matching string was found. So then when you have a string coming into the compressor and you need to know if it is a duplicate of previous data, you hash the first 4 bytes and use that as an index into the table to get the offset. This is much faster than a brute-force search through the previous data. At that point you can verify whether the data actually matches (ie. not a hash collision) and whether the match is longer than the 4-byte minimum. Or maybe the string isn't a duplicate at all, in which case the table entry would be some pre-initialized 'empty' value.

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 7 of 14, by Harry Potter

Posted on 2022-12-03, 17:11

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

So, if a match is found, start scanning from there, and, if only the first two or three bytes match, scan to there, right? Also, if a match is not found, don't compress with lz77, right?

BTW, I'm using a 60,000-byte window for pattern-matching.

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 8 of 14, by Harry Potter

Posted on 2022-12-16, 15:53

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

I'm just about ready for the optimization. Are there any other recommendations to speed up lz77 block-searching? I have a method that helps a little but needs an 8k array of bits, where each bit determines whether an associated word was already used. For every lz77 scan, I first check the array to see if the word was already used. If not, don't scan and just return no match. For every byte compressed, set the bit in the array corresponding to the current word. What do you think?

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 9 of 14, by bakemono

Posted on 2022-12-17, 11:47

bakemono Offline

Rank Oldbie

Rank: Oldbie
Posts: 885
Joined: 2018-01-15, 06:56

By 'word' you mean 16 bits, right? It sounds like it would speed up the compression. The only thing missing from this approach is any kind of hint on where to scan.

Do you scan beginning with the oldest (lowest address) data? Maybe it would be faster to search in the other direction. (Particularly on CPUs with L1 cache, since more recent data is less likely to have already been ejected from the cache)

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 10 of 14, by Harry Potter

Posted on 2022-12-17, 12:08

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

By 'word' I do mean 16 bits, and I do scan from the most recent byte first. The last time I tried this technique, it only helped a little. I'm going to apply this now on an 8-bit version. 😀

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 11 of 14, by Harry Potter

Posted on 2022-12-17, 12:27

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

My idea didn't work. 🙁 And it cost me 8k on an 8-bit system, which is significant. Worse yet, it hurt the compression ratio. 🙁

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 12 of 14, by Harry Potter

Posted on 2022-12-25, 16:02

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

I was trying a version of LZW, but the compression ratio was very poor. 🙁 I gained a lot of ground with it, but the ratio is still poor. I attached two code snippets of the technique:

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 13 of 14, by Harry Potter

Posted on 2022-12-25, 16:04

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

I forgot to mention: I'm still trying to process the docs on hash tables. While I'm trying to figure them out, is there anything else I can do to optimize LZ77?

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 14 of 14, by Harry Potter

Posted on 2022-12-25, 16:17

Harry Potter Offline

Rank Oldbie

Rank: Oldbie
Posts: 1120
Joined: 2011-06-05, 14:42
Location: New York, U.S.

These files are a little dated. I attached more recent copies.

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Go to top of page Go to top of page

Back to Software