KMP algorithm - A string matching algorithm

题目

给定两个字符串str和match，长度分别为N和M。实现一个算法，如果字符串str中含有字串match，则返回match在str中的开始位置，不含有则返回-1。

str=“acbc”，match=“bc”。返回2。
str=“acbc”，match=“bcc”。返回-1。

如果match的长度大于str长度(M>N)，str必然不会含有match，可直接返回-1。但如果N>=M，要求算法复杂度O(N)。

思路

kmp算法的核心是next数组,长度是要查找的字符串长度，表示的是：

next[i]: The length of the longest proper prefix in the (sub)pattern that matches a proper suffix in the same (sub)pattern.

next[i]是当前位置i以前的字符串

proper prefixes : All the characters in a string, with one or more cut off the end. “S”, “Sn”, “Sna”, and “Snap” are all the proper prefixes of “Snape”.
proper suffixes : All the characters in a string, with one or more cut off the beginning. “agrid”, “grid”, “rid”, “id”, and “d” are all proper suffixes of “Hagrid”.

proper prefixes和proper suffixes都不包含自己本身

如何计算next,O(m)
cn跳的位置 pos要计算的每个位置
- next[0] = -1
- next[1] = 0
- str[pos-1] == str[cn]-> next[pos] = cn;
- str[pos-1] != str[cn] && cn!=0 -> cn = next[cn]
- str[pos-1] != str[cn] && cn == 0 -> next[pos] = 0

如何利用 next

If a partial match of length partial_match_length is found and next[partial_match_length] > 1,
we may skip ahead partial_match_length - next[partial_match_length - 1] characters.

More:
KMP algoritm
KMP animation

代码

public int KMP(String string, String match){
	
	char[] cstring = string.toCharArray();
	char[] cmatch = match.toCharArray();
	//生成next
	int [] next = initNext(cmatch);
	
	int i = 0, j = 0;
	while(i<string.length() && j<match.length()){
		if(cstring[i] == cmatch[j]){
			i++;
			j++;
		}else if(next[j] == -1){
			i++;
		}else{
			//下次从next[j]位置比较，就是跳过了j-next[j]个位置，此时i不变的
			j = next[j];
		}
	
	}
	
	if(j == cmatch.length){
		//j遍历完整的match串,说明完全匹配
		return i-j;
	}else{
		//没有遍历完整，无匹配
		return -1;
	}
	
	
}
public int[] initNext(char[] cmatch){
	if(cmatch.length == 1){
		return new int[]{-1};
	}
	//next与match等长
	int[] next = new int[cmatch.length];
	next[0] = -1;
	next[1] = 0;
	int pos = 2, cn = 0;
	while(pos < next.length){
		//next记录的是当前位置以前的状况，即不包括当前位置
		if(cmatch[pos-1] == cmatch[cn]){
			next[pos++] = ++cn;
		}else {
			if(cn != 0){
				//pos没有自增，因为还要继续，直到cn=0为止
				cn = next[cn];
			}else{
				next[pos++] = 0;
			}
		}
	}
	
	return next;
}

5.1

2015-11-05
algorithm

5.1

KMP algorithm - A string matching algorithm

题目

思路

代码