项目作者: AshishYUO

项目描述 :
Huffman Compression Algorithm using basic C++
高级语言: C++
项目地址: git://github.com/AshishYUO/huffman-compression.git
创建时间: 2019-04-24T07:36:02Z
项目社区:https://github.com/AshishYUO/huffman-compression

开源协议:

下载


Huffman Compression

A minimal huffman compression program using C++ language.

Theory

Huffman compression is based on huffman coding technique. The huffman coding creates an optimal binary tree that is constructed based on frequency of an item/character in a file.

Let’s take an example of a file or string containing data like “AAABBCAAADDFFAAAADCCCDAADDDAAACGAAACACA”

  1. character frequency
  2. A 20
  3. B 2
  4. C 7
  5. D 7
  6. F 2
  7. G 1

The above following data will generate a binary tree starting from the characters having the lowest frequency, and construct till we use all the characters as follows:

  1. Pass 1:
  2. (A, 20) (C, 7) (D, 7) (B, 2) (F, 2) (G, 1) (A, 20) (C, 7) (D, 7) (**, 3) (B, 2)
  3. => / \
  4. / \
  5. (F, 2) (G, 1)
  6. Pass 2:
  7. (A, 20) (C, 7) (D, 7) (**, 3) (B, 2) (A, 20) (C, 7) (D, 7) (**, 5)
  8. / \ / \
  9. / \ / \
  10. (F, 2) (G, 1) => (**, 3) (B, 2)
  11. / \
  12. / \
  13. (F, 2) (G, 1)
  14. Pass 3:
  15. (A, 20) (C, 7) (D, 7) (**, 5) (A, 20) (C, 7) (**, 12)
  16. / \ / \
  17. / \ / \
  18. (**, 3) (B, 2) => (D, 7) (**, 5)
  19. / \ / \
  20. / \ / \
  21. (F, 2) (G, 1) (**, 3) (B, 2)
  22. / \
  23. / \
  24. (F, 2) (G, 1)
  25. Pass 4:
  26. (A, 20) (**, 12) (C, 7) (A, 20) (**, 19)
  27. / \ / \
  28. / \ / \
  29. (D, 7) (**, 5) (**, 12) (C, 7)
  30. / \ / \
  31. / \ => / \
  32. (**, 3) (B, 2) (D, 7) (**, 5)
  33. / \ / \
  34. / \ / \
  35. (F, 2) (G, 1) (**, 3) (B, 2)
  36. / \
  37. / \
  38. (F, 2) (G, 1)
  39. Pass 5 (Final), with huffman code:
  40. Left Branch denoting 0 and right 1
  41. (A, 20) (**, 19) (**, 39)
  42. / \ / \
  43. / \ (0)/ \(1)
  44. (**, 12) (C, 7) [0] <= (A, 20) (**, 19)
  45. / \ / \
  46. / \ (0)/ \(1)
  47. (D, 7) (**, 5) => (**, 12) (C, 7) => [11]
  48. / \ / \
  49. / \ (0)/ \(1)
  50. (**, 3) (B, 2) [100] <= (D, 7) (**, 5)
  51. / \ / \
  52. / \ / \(1)
  53. (F, 2) (G, 1) (**, 3) (B, 2) => [1011]
  54. / \
  55. (0)/ \(1)
  56. [10100] <= (F, 2) (G, 1) => [10101]
  57. Final Huffman Codes:
  58. character frequency Huffman Codes Actual Binary
  59. A 20 0 01000001
  60. B 2 1011 01000010
  61. C 7 11 01000011
  62. D 7 100 01000100
  63. F 2 10100 01000110
  64. G 1 10101 01000111
  65. __________________________________________________________________
  66. 39 75 312

Hence the resultant data is stored as binary written as ‘000101110110001001001010010100000010011111110000100100100000111010100011011000000‘, which has 75 bits
plus 5 digits appended to round off the remaining bits while storing in the file.

Executing program

  1. g++ huffman-compression.cpp
  2. [a.exe | ./a.out] -c|-dc [filename_to_be_compressed]
  3. (The order must be same: first: option to compress/decompress and then second: filename)

The file to be compressed will generate a file with extension ‘.abiz’, which is the compressed version of the original one.

Note:

  • Compressing files other than ASCII based text files (e.g., audio (.mp3), video (.mp4), pdfs, document (.doc/.docx), etc.) can have little or no effect on the resulting size.