KMP algorithm - A string matching algorithm
题目
给定两个字符串str和match,长度分别为N和M。实现一个算法,如果字符串str中含有字串match,则返回match在str中的开始位置,不含有则返回-1。
str=“acbc”,match=“bc”。返回2。
str=“acbc”,match=“bcc”。返回-1。
如果match的长度大于str长度(M>N),str必然不会含有match,可直接返回-1。但如果N>=M,要求算法复杂度O(N)。
思路
kmp算法的核心是next数组,长度是要查找的字符串长度,表示的是:
next[i]: The length of the longest proper prefix in the (sub)pattern that matches a proper suffix in the same (sub)pattern.
next[i]是当前位置i以前的字符串
proper prefixes : All the characters in a string, with one or more cut off the end. “S”, “Sn”, “Sna”, and “Snap” are all the proper prefixes of “Snape”.
proper suffixes : All the characters in a string, with one or more cut off the beginning. “agrid”, “grid”, “rid”, “id”, and “d” are all proper suffixes of “Hagrid”.
proper prefixes和proper suffixes都不包含自己本身
- 如何计算next,O(m)
cn跳的位置 pos要计算的每个位置- next[0] = -1
- next[1] = 0
- str[pos-1] == str[cn]-> next[pos] = cn;
- str[pos-1] != str[cn] && cn!=0 -> cn = next[cn]
- str[pos-1] != str[cn] && cn == 0 -> next[pos] = 0
如何利用 next
If a partial match of length partial_match_length is found and next[partial_match_length] > 1,
we may skip ahead partial_match_length - next[partial_match_length - 1] characters.
More:
KMP algoritm
KMP animation
代码
|
|