The Rabin–Karp algorithm or Karp–Rabin algorithm is a string searching algorithm that uses hashing to find any one of a set of pattern strings in a text.Its average and best case running time is O(n+m) in space O(p), but its worst-case time is O(nm) where n is the length of the text and m is the length of the pattern.
Algorithm implementation in java for string matching
void RabinfindPattern(String text,String pattern){
/*
q a prime number
p hash value for pattern
t hash value for text
d is the number of unique characters in input alphabet
*/
int d=128;
int q=100;
int n=text.length();
int m=pattern.length();
int t=0,p=0;
int h=1;
int i,j;
//hash value calculating function
for (i=0;i<m-1;i++)
h = (h*d)%q;
for (i=0;i<m;i++){
p = (d*p + pattern.charAt(i))%q;
t = (d*t + text.charAt(i))%q;
}
//search for the pattern
for(i=0;i<end-m;i++){
if(p==t){
//if the hash value matches match them character by character
for(j=0;j<m;j++)
if(text.charAt(j+i)!=pattern.charAt(j))
break;
if(j==m && i>=start)
System.out.println("Pattern match found at index "+i);
}
if(i<end-m){
t =(d*(t - text.charAt(i)*h) + text.charAt(i+m))%q;
if(t<0)
t=t+q;
}
}
}
While calculating hash value we are dividing it by a prime number in order to avoid collision.After dividing by prime number the chances of collision will be less, but still ther is a chance that the hash value can be same for two strings,so when we get a match we have to check it character by character to make sure that we got a proper match.
t =(d*(t - text.charAt(i)*h) + text.charAt(i+m))%q;
This is to recalculate the hash value for pattern,first by removing the left most character and then adding the new character from the text.