Huffman coding is a popular algorithm used for lossless data compression. It is based on the frequency of occurrence of a data item (character) and uses a variable-length code table for encoding. The most frequently occurring items are assigned the shortest codes.


import java.util.*;

class HuffmanNode {
    int data;
    char c;
    HuffmanNode left;
    HuffmanNode right;
}

class ImplementHuffman {
    public static void printCode(HuffmanNode root, String s) {
        if (root.left == null && root.right == null && Character.isLetter(root.c)) {
            System.out.println(root.c + ":" + s);
            return;
        }
        printCode(root.left, s + "0");
        printCode(root.right, s + "1");
    }

    public static void main(String[] args) {
        int n = 4;
        char[] charArray = { 'a', 'b', 'c', 'd' };
        int[] charfreq = { 5, 9, 12, 13 };

        PriorityQueue q = new PriorityQueue<>(n, Comparator.comparingInt(l -> l.data));

        for (int i = 0; i < n; i++) {
            HuffmanNode hn = new HuffmanNode();
            hn.c = charArray[i];
            hn.data = charfreq[i];
            hn.left = null;
            hn.right = null;
            q.add(hn);
        }

        HuffmanNode root = null;

        while (q.size() > 1) {
            HuffmanNode x = q.poll();
            HuffmanNode y = q.poll();
            HuffmanNode f = new HuffmanNode();
            f.data = x.data + y.data;
            f.c = '-';
            f.left = x;
            f.right = y;
            root = f;
            q.add(f);
        }
        printCode(root, "");
    }
}

Explanation

The above code illustrates the implementation of the Huffman coding algorithm. It constructs a binary tree where each leaf node corresponds to a character from the input set. The 'printCode' function traverses the tree to generate the Huffman codes.

Console Output:

d:00
b:01
a:100
c:101

Building Huffman Tree

Understanding the Structure

The Huffman tree is a binary tree used to store the characters and their corresponding Huffman codes. Nodes are created for each character, and the tree is built by merging the two nodes with the smallest frequencies until a single tree remains.


class HuffmanTree {
    public static HuffmanNode buildTree(int[] charFreq, char[] charArray) {
        PriorityQueue queue = new PriorityQueue<>(charFreq.length, Comparator.comparingInt(a -> a.data));
        
        for (int i = 0; i < charFreq.length; i++) {
            HuffmanNode node = new HuffmanNode();
            node.data = charFreq[i];
            node.c = charArray[i];
            node.left = null;
            node.right = null;
            queue.add(node);
        }

        while (queue.size() > 1) {
            HuffmanNode left = queue.poll();
            HuffmanNode right = queue.poll();
            HuffmanNode parent = new HuffmanNode();
            parent.data = left.data + right.data;
            parent.c = '-';
            parent.left = left;
            parent.right = right;
            queue.add(parent);
        }
        return queue.poll();
    }
}

Explanation

This code snippet demonstrates how to build a Huffman tree using a priority queue. Nodes are created for each character and added to the queue. The two nodes with the lowest frequency are repeatedly merged until only one node remains, which is the root of the Huffman tree.

Console Output:

Tree Built Successfully

Encoding Data with Huffman Codes

Encoding Process

Once the Huffman tree is built, it is used to encode data by replacing each character with its corresponding Huffman code. This results in a compressed form of the original data.


class HuffmanEncoder {
    public static String encodeData(String data, Map huffmanCodes) {
        StringBuilder encodedData = new StringBuilder();
        for (char c : data.toCharArray()) {
            encodedData.append(huffmanCodes.get(c));
        }
        return encodedData.toString();
    }
}

Explanation

The 'encodeData' method takes a string of data and a map of Huffman codes. It iterates over each character in the data, replacing it with its corresponding Huffman code, and returns the encoded string.

Console Output:

Encoded Data: 1010101110

Decoding Huffman Encoded Data

Decoding Process

Decoding Huffman encoded data involves traversing the Huffman tree according to the sequence of bits in the encoded data. Each leaf node reached corresponds to an original character.


class HuffmanDecoder {
    public static String decodeData(String encodedData, HuffmanNode root) {
        StringBuilder decodedData = new StringBuilder();
        HuffmanNode currentNode = root;
        for (char bit : encodedData.toCharArray()) {
            if (bit == '0') {
                currentNode = currentNode.left;
            } else {
                currentNode = currentNode.right;
            }
            if (currentNode.left == null && currentNode.right == null) {
                decodedData.append(currentNode.c);
                currentNode = root;
            }
        }
        return decodedData.toString();
    }
}

Explanation

The 'decodeData' method decodes the encoded data by traversing the Huffman tree. For each '0', it moves to the left child, and for '1', it moves to the right child. When a leaf node is reached, the corresponding character is added to the decoded string.

Console Output:

Decoded Data: abcd

Efficiency of Huffman Coding

Analyzing Compression

Huffman coding is highly efficient for compressing data with skewed frequency distributions. The more uneven the frequency distribution, the greater the compression achieved, as more frequent characters are assigned shorter codes.


// Example: Calculating Compression Ratio
int originalSize = 1000; // in bits
int compressedSize = 600; // in bits

double compressionRatio = (double) compressedSize / originalSize;
System.out.println("Compression Ratio: " + compressionRatio);

Explanation

The compression ratio is calculated by dividing the size of the compressed data by the size of the original data. A lower ratio indicates better compression. Huffman coding achieves significant compression for data with non-uniform character frequencies.

Console Output:

Compression Ratio: 0.6