Ever since I found out what a Huffman tree was, I've found data compression to be a fascinating subject. I've tried to apply the concepts to language ("Plan B" style -- for history of it, Google
Lojban "plan b"), election methods (vector quantization of voting districts), and graphics (Google
"hextar curve" or visit this
Fractint L-system page).
Unfortunately, to quote the famous philosopher Barbie, "Math is hard." I'm fine with arithmetic, geometry, algebra, even basic calculus, but things like Abelian algebra, Hilbert space, and bra-ket notation make my hair fall out, and the math used in data compression can be a real pain without some kind soul to explain it.
Even so, I like the ideas involved. Another field I like is fractals -- I love things combining symmetry and uniqueness . For awhile, "fractal compression" held my interest, until I realized it was not any better (and often worse) than vector quantization. Wavelet compression is also quite interesting (once again, love the concepts, have a hard time with the math).
Then I saw some people doing interesting things with the Hilbert curve. Basically, a Hilbert curve is a way to map data in multiple dimensions onto a one dimensional string. It also tends to be compact -- points close together on the Hilbert curve are close together in space (though the converse is not always true). It is a plane-filling curve in two dimensions, and there is a three-dimensional space-filling version as well -- see
Mathworld's Hilbert curve page. Still, there is something about the Hilbert curve I do not like. It isn't closed for one thing -- the start and endpoint are on opposite sides of the square. There
is however a variant known as the Moore curve where the start and endpoint are only one unit apart -- a single segment is sufficient to make it a closed curve. It can be seen here on this
plane-filling curves page. I'm not certain if there is a 3-D variant of the Moore curve as there is for the Hilbert curve, but it does seem quite likely.
Which leads me to the compression idea. There are two major ways to attack the problem: top-down, taking the complete picture and subdividing with some function, then taking the subpictures and subdividing them, continuing to the level of individual pixels; and bottom-up, taking individual pixels and combining them, and continue the combination algorithm until you complete the picture. In addition the algorithm can choose for greatest difference, or nearest neighbor (I'll mention some variants later).
Top-down, greatest difference:1. The Moore curve makes one loop of the entire picture, visiting each pixel once. If you imagine this as one big circle, with each pixel given a certain value, find the diameter that results in the greatest difference between combined values. For example, if the left side of a picture is black and the right side white, the greatest difference will be equivalent to a verticle line between them.
2. Take each subloop and join the endpoints, and find the diameter that results in the greatest difference in combined values. Repeat until you have subdivided the picture into pairs of pixels.
3. Run your compression algorithm on the final data sets (more on that later).
4. If you are interested in video compression, you can like of a video stream as a three dimensional block. A 3-D version of the Moore curve can be used, with the same subdividing and looping.
Bottom-up, least difference:1. Take the Moore curve of the image or video, and compare the absolute difference in pixel pairs and choose the smallest value. For example, if comparing the pixels A, B, C, and D, you would compare |A-B|+|C-D| and |A-D|+|B-C| and choose the smaller of the two.
2. Combine pairs of pixels the same way, and choose the combination with the smallest total.
3. Continue until you have two halves of the image or video, and run your compression algorithm on the data sets.
Either method should yield good results. Basically, the bigger the blocks the more we want there to be a difference between them (because like colors will preferentially be in each block), while the smaller the block the more we want them to be the same (to improve data compression). Which brings us to the final method.
Brute-Force, maximal compression: For a picture 1024 x 1024 pixels, you are looking at a little more than one million ways of partitioning the image. For a 1024 x 1024 x 7200 seconds x 30 FPS video, there are roughly a quarter of a trillion ways to partition the image. Short of some kind of quantum computing or dramatically faster processors, a brute-force method of video probably won't happen.
On the other hand, a test of different partitioning algorithms over a simpler subset would probably be a good idea. A 1024 x 1024 x 1024 cube -- equal to a 1024 x 1024 x 34 second movie -- can be partitioned with a Moore curve with roughly 1 billion possible partitioning methods (think a 30-bit number). If one were to calculate all the various possibilities as well as how much they compressed, and then compared them to the methods above, one could get a feel for the most efficient algorithm.
I'll go over this a bit more when the inspiration strikes (grin).