Huffman coding
Huffman coding is an algorithm that achieve lossless encoding. The procedure behind its scheme includes categorizing numerical values from a set-in order of their frequency. The minimal frequency numbers are gradually eliminated via the Huffman tree, that adds minimal frequency from the sorted list in every new branch. The sum is then positioned above the two eliminated minimal frequency values and replace them in the new sorted list. For every new branch, the algorithm moves the direction of the tree either right or to the left where right ones are higher values and left ones are lower values. When sorted list is done then the tree is complete, the final value is zero if the tree ended on a left number, or it is one if it ended on the right.
Huffman coding comprises of 2 algorithm techniques to successfully encode and decode files. The below set of rules are implemented in the source file.
Follow the below instructions to try out the huffman coding,
.py
extension, ex: hc.py
hc.py
folderpython3
hc.py
file by inputting from hc import HuffmanCoding
sample.txt
file, with some texts, in the same folderpath
variable to the sample.txt
file location, ex: path=/Users/*****/sample.txt
compressor
method, H = HuffmanCoding(path)
compressor()
method to create a compressed binary file, ex: H.compressor()
Calling the decompressor()
method would get back the content of sample.txt
file, ex: H.decopress('/Users/**** /sample.bin')
Size comparisson before and after huffman code has been applied.
Sample File | Compressed File | Decompressed File |
---|---|---|
![]() |
![]() |
![]() |