WikiGalaxy

Personalize

Karp Rabin Fingerprinting: Introduction

What is Karp Rabin Fingerprinting?

The Karp Rabin fingerprinting algorithm is a probabilistic method used for string matching, particularly effective in finding a pattern within a larger text. It utilizes hashing to compare strings efficiently.

How Does It Work?

By converting strings into numerical values (fingerprints) using a hash function, the algorithm compares these fingerprints rather than the strings themselves, enhancing speed and efficiency.

Applications

This algorithm is widely used in plagiarism detection, network security, and data deduplication due to its ability to handle large datasets effectively.

Example 1: Basic String Matching

Problem Statement

Find the substring "abc" in the text "abcpqrabcxyz".

Approach

Convert both the pattern and the substrings of the text to hash values and compare these hashes.


public class KarpRabin {
    public static void main(String[] args) {
        String text = "abcpqrabcxyz";
        String pattern = "abc";
        int patternHash = pattern.hashCode();
        for (int i = 0; i <= text.length() - pattern.length(); i++) {
            String substring = text.substring(i, i + pattern.length());
            if (substring.hashCode() == patternHash && substring.equals(pattern)) {
                System.out.println("Pattern found at index " + i);
            }
        }
    }
}
    

Console Output:

Pattern found at index 0

Pattern found at index 6

Example 2: Detecting Plagiarism

Problem Statement

Detect if a document contains plagiarized content from another source.

Approach

Use Karp Rabin to compare the hash of the suspected plagiarized text with the original document's hashes.


import java.util.*;

public class PlagiarismDetection {
    public static void main(String[] args) {
        String original = "The quick brown fox jumps over the lazy dog";
        String suspect = "brown fox jumps";
        int suspectHash = suspect.hashCode();
        for (int i = 0; i <= original.length() - suspect.length(); i++) {
            String substring = original.substring(i, i + suspect.length());
            if (substring.hashCode() == suspectHash && substring.equals(suspect)) {
                System.out.println("Plagiarized content found at index " + i);
            }
        }
    }
}
    

Console Output:

Plagiarized content found at index 10

Example 3: Network Security

Problem Statement

Identify malicious patterns in network packets.

Approach

Match packet data against known malicious patterns using hash comparison.


public class NetworkSecurity {
    public static void main(String[] args) {
        String packetData = "GET /malicious/path HTTP/1.1";
        String maliciousPattern = "/malicious/path";
        int patternHash = maliciousPattern.hashCode();
        for (int i = 0; i <= packetData.length() - maliciousPattern.length(); i++) {
            String substring = packetData.substring(i, i + maliciousPattern.length());
            if (substring.hashCode() == patternHash && substring.equals(maliciousPattern)) {
                System.out.println("Malicious pattern detected at index " + i);
            }
        }
    }
}
    

Console Output:

Malicious pattern detected at index 4

Example 4: Data Deduplication

Problem Statement

Identify duplicate files in a storage system.

Approach

Generate and compare hashes of file contents to detect duplicates.


import java.util.*;

public class DataDeduplication {
    public static void main(String[] args) {
        Map files = new HashMap<>();
        files.put("file1.txt", "Hello World");
        files.put("file2.txt", "Hello World");
        files.put("file3.txt", "Goodbye World");

        Set seenHashes = new HashSet<>();
        for (String content : files.values()) {
            int hash = content.hashCode();
            if (seenHashes.contains(hash)) {
                System.out.println("Duplicate file detected with content: " + content);
            } else {
                seenHashes.add(hash);
            }
        }
    }
}
    

Console Output:

Duplicate file detected with content: Hello World

Example 5: Substring Search Optimization

Problem Statement

Optimize substring search in large text files.

Approach

Utilize Karp Rabin's hashing technique for efficient substring searching.


public class SubstringSearch {
    public static void main(String[] args) {
        String text = "This is a simple example for Karp Rabin substring search";
        String pattern = "simple";
        int patternHash = pattern.hashCode();
        for (int i = 0; i <= text.length() - pattern.length(); i++) {
            String substring = text.substring(i, i + pattern.length());
            if (substring.hashCode() == patternHash && substring.equals(pattern)) {
                System.out.println("Pattern found at index " + i);
            }
        }
    }
}
    

Console Output:

Pattern found at index 10

logo of wikigalaxy

Newsletter

Subscribe to our newsletter for weekly updates and promotions.

Privacy Policy

 • 

Terms of Service

Copyright © WikiGalaxy 2025