Home of a simple Huffman-based encoder and decoder.
Home of a simple Huffman-based encoder and decoder.
This project was requested by Professor Diego V. Noble in my BCompSc university course.
Given the King James Bible, encoded in ASCII, run the encoder.
The encoder will create a new file, king_james_ENC
(no extension), with a one-line header containing the character decoding table and the encoded text.
The decoder reads this file and uses the decoding table to decode the contents of the encoded file, saving it in king_james.txt
.
I used a Priority Queue for the final version, but I also made a version that uses a Vector.
This was done to emphasize the difference in execution time between the two data structures.
Using a priority queue, we have close to $O(nlog(n))$ in complexity.
Using a vector, we have $O(n^2)$ complexity.
The program was executed 20 times, half of which used vectors and the other half used a priority queue to build the Huffman Tree.
It’s worth mentioning that the file I used (king_james.txt) is around 4MiB and 100224 (one hundred thousand two hundred twenty-four) lines, which resulted in fake results due to the small elapsed times (resulting in an also small standard deviation). The results may have been tainted by random function calls by the processor (e.g. background processes).
In order to get more accurate results, this program should be run using a bigger text file (e.g. the Wikipedia Front Page).
Execution no. | Elapsed time (in microseconds) |
---|---|
1 | 194 |
2 | 188 |
3 | 188 |
4 | 186 |
5 | 189 |
6 | 188 |
7 | 188 |
8 | 194 |
9 | 187 |
10 | 194 |
sum | 189.6 |
Execution no. | Elapsed time (in microseconds) |
---|---|
1 | 24 |
2 | 27 |
3 | 28 |
4 | 27 |
5 | 28 |
6 | 27 |
7 | 28 |
8 | 29 |
9 | 28 |
10 | 28 |
sum | 27.4 |
After 10 executions each:
Data Structure | Elapsed Time ($\mu$) |
---|---|
Vector | 189.6 microseconds (0.0001896 seconds) |
Priority Queue | 27.4 microseconds (0.0000274 seconds) |
Even with tainted results, it’s clear to see that using a priority queue is much faster than using a std::vector.