WikiGalaxy

Personalize

Longest Common Substring Problem

Introduction

The Longest Common Substring Problem is a classic problem in computer science and bioinformatics. It involves finding the longest substring that is common to two or more strings. This problem is crucial in various applications like DNA sequence analysis, text comparison, and data deduplication.

Example 1: Basic Concept

Description

Consider two strings, "ABABC" and "BABCA". The longest common substring here is "BABC", which has a length of 4.

Diagram


String 1: A B A B C
          | | | |
String 2:   B A B C A
      

Example 2: Dynamic Programming Approach

Description

Using dynamic programming, we can solve this problem efficiently. The idea is to build a 2D table where cell (i, j) contains the length of the longest common suffix of substrings ending at i and j.

Code Example


public class LCS {
    public static int longestCommonSubstring(String s1, String s2) {
        int[][] dp = new int[s1.length()+1][s2.length()+1];
        int maxLength = 0;
        for (int i = 1; i <= s1.length(); i++) {
            for (int j = 1; j <= s2.length(); j++) {
                if (s1.charAt(i-1) == s2.charAt(j-1)) {
                    dp[i][j] = dp[i-1][j-1] + 1;
                    maxLength = Math.max(maxLength, dp[i][j]);
                }
            }
        }
        return maxLength;
    }
}
      

Example 3: Space Optimization

Description

To optimize space, we can use two arrays instead of a 2D table, reducing space complexity from O(n*m) to O(min(n,m)).

Code Example


public class LCSSpaceOptimized {
    public static int longestCommonSubstring(String s1, String s2) {
        int[] prev = new int[s2.length() + 1];
        int[] curr = new int[s2.length() + 1];
        int maxLength = 0;
        for (int i = 1; i <= s1.length(); i++) {
            for (int j = 1; j <= s2.length(); j++) {
                if (s1.charAt(i-1) == s2.charAt(j-1)) {
                    curr[j] = prev[j-1] + 1;
                    maxLength = Math.max(maxLength, curr[j]);
                } else {
                    curr[j] = 0;
                }
            }
            int[] temp = prev;
            prev = curr;
            curr = temp;
        }
        return maxLength;
    }
}
      

Example 4: Application in Bioinformatics

Description

In bioinformatics, the longest common substring problem helps in identifying similar sequences in DNA or protein sequences, which can be crucial for understanding genetic relationships.

Real-world Example


// DNA sequences comparison
String dna1 = "AGGTAB";
String dna2 = "GXTXAYB";
int result = LCS.longestCommonSubstring(dna1, dna2);
System.out.println("Length of Longest Common Substring: " + result);
      

Example 5: Handling Edge Cases

Description

Edge cases such as strings with no common substring or identical strings should be considered. The algorithm should return 0 for no common substring and the length of the string for identical strings.

Code Example


public class LCSEdgeCases {
    public static void main(String[] args) {
        String str1 = "ABC";
        String str2 = "XYZ";
        System.out.println("No common substring: " + LCS.longestCommonSubstring(str1, str2));

        String identical = "SAME";
        System.out.println("Identical strings: " + LCS.longestCommonSubstring(identical, identical));
    }
}
      
logo of wikigalaxy

Newsletter

Subscribe to our newsletter for weekly updates and promotions.

Privacy Policy

 • 

Terms of Service

Copyright © WikiGalaxy 2025