## Algorithms

Moderators: None (Apply to moderate this forum)
Number of posts: 786

This Forum Only

compressing a short string Posted by iap on 19 Nov 2004 at 5:40 AM
Is there a way to compress a formatted 20 character string, with limited number of possible characters (64 characters: a-z A-z 0-9 space and comma, to be exact)?
I think the LZH algorithm, or any other indexing algorithm, would not help us here, because the string is not long enough to have patterns.

Do you know a way?
Re: compressing a short string Posted by Jonathan on 19 Nov 2004 at 7:43 AM
: Is there a way to compress a formatted 20 character string, with limited
: number of possible characters (64 characters: a-z A-z 0-9 space and
: comma, to be exact)?
If you want to do it simply, then if you've got 64 possible characters, then you can obviously represent them in 6 bits, saving you 2 bits per character = total saving of 40 bits which is 5 bytes. So you can represent it in 15 bytes rather than 20 without even using a compression algorithm, as such.

: I think the LZH algorithm, or any other indexing algorithm, would not
: help us here, because the string is not long enough to have patterns.
:
Yeah, Lempel Ziv probably isn't the way if repetitions are unlikely. Have you considered something as simple as Huffman coding? If you can come up with a constant encoding it may work out OK - a variable encoding gives better compression, but then you need to store the encodings table, which will be relatively expensive even if done efficiently! If you can assume vowels are more common you can give them shorter codes.

Run length encoding? Don't know much about this to be honest. Burrows Wheeler could also be worth a look, but it's kinda complicated from what I've seen.

Jonathan

###
for(74,117,115,116){\$::a.=chr};((\$_.='qwertyui')&&
(tr/yuiqwert/her anot/))for(\$::b);for(\$::c){\$_.=\$^X;
/(p.{2}l)/;\$_=\$1}\$::b=~/(..)\$/;print("\$::a\$::b \$::c hack\$1.");

Re: compressing a short string Posted by iap on 20 Nov 2004 at 2:57 AM
: If you want to do it simply, then if you've got 64 possible characters, then you can obviously represent them in 6 bits, saving you 2 bits per character = total saving of 40 bits which is 5 bytes. So you can represent it in 15 bytes rather than 20 without even using a compression algorithm, as such.
:

I was thinking aboput this solution... But i think I want to explore the algorithmic approach. Just for the sake of learning...

: Yeah, Lempel Ziv probably isn't the way if repetitions are unlikely. Have you considered something as simple as Huffman coding? If you can come up with a constant encoding it may work out OK - a variable encoding gives better compression, but then you need to store the encodings table, which will be relatively expensive even if done efficiently! If you can assume vowels are more common you can give them shorter codes.
:
: Run length encoding? Don't know much about this to be honest. Burrows Wheeler could also be worth a look, but it's kinda complicated from what I've seen.
:
: Jonathan
:
: ###
: for(74,117,115,116){\$::a.=chr};((\$_.='qwertyui')&&
: (tr/yuiqwert/her anot/))for(\$::b);for(\$::c){\$_.=\$^X;
: /(p.{2}l)/;\$_=\$1}\$::b=~/(..)\$/;print("\$::a\$::b \$::c hack\$1.");

Can you explain briefly about the huffman code? I always thought it relays on patterns and indexes like LZH (acctually, LZH is Lemple Ziv Huffman), Wich we agreed, does not fits here...