KMP algorithm - A string matching algorithm


题目

给定两个字符串str和match,长度分别为N和M。实现一个算法,如果字符串str中含有字串match,则返回match在str中的开始位置,不含有则返回-1。

str=“acbc”,match=“bc”。返回2。
str=“acbc”,match=“bcc”。返回-1。

如果match的长度大于str长度(M>N),str必然不会含有match,可直接返回-1。但如果N>=M,要求算法复杂度O(N)。


思路

kmp算法的核心是next数组,长度是要查找的字符串长度,表示的是:

next[i]: The length of the longest proper prefix in the (sub)pattern that matches a proper suffix in the same (sub)pattern.

next[i]是当前位置i以前的字符串

proper prefixes : All the characters in a string, with one or more cut off the end. “S”, “Sn”, “Sna”, and “Snap” are all the proper prefixes of “Snape”.
proper suffixes : All the characters in a string, with one or more cut off the beginning. “agrid”, “grid”, “rid”, “id”, and “d” are all proper suffixes of “Hagrid”.

proper prefixes和proper suffixes都不包含自己本身

  • 如何计算next,O(m)
    cn跳的位置 pos要计算的每个位置
    • next[0] = -1
    • next[1] = 0
    • str[pos-1] == str[cn]-> next[pos] = cn;
    • str[pos-1] != str[cn] && cn!=0 -> cn = next[cn]
    • str[pos-1] != str[cn] && cn == 0 -> next[pos] = 0

如何利用 next

If a partial match of length partial_match_length is found and next[partial_match_length] > 1,
we may skip ahead partial_match_length - next[partial_match_length - 1] characters.

More:
KMP algoritm
KMP animation


代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
public int KMP(String string, String match){
char[] cstring = string.toCharArray();
char[] cmatch = match.toCharArray();
//生成next
int [] next = initNext(cmatch);
int i = 0, j = 0;
while(i<string.length() && j<match.length()){
if(cstring[i] == cmatch[j]){
i++;
j++;
}else if(next[j] == -1){
i++;
}else{
//下次从next[j]位置比较,就是跳过了j-next[j]个位置,此时i不变的
j = next[j];
}
}
if(j == cmatch.length){
//j遍历完整的match串,说明完全匹配
return i-j;
}else{
//没有遍历完整,无匹配
return -1;
}
}
public int[] initNext(char[] cmatch){
if(cmatch.length == 1){
return new int[]{-1};
}
//next与match等长
int[] next = new int[cmatch.length];
next[0] = -1;
next[1] = 0;
int pos = 2, cn = 0;
while(pos < next.length){
//next记录的是当前位置以前的状况,即不包括当前位置
if(cmatch[pos-1] == cmatch[cn]){
next[pos++] = ++cn;
}else {
if(cn != 0){
//pos没有自增,因为还要继续,直到cn=0为止
cn = next[cn];
}else{
next[pos++] = 0;
}
}
}
return next;
}