I'm not sure if this is the correct answer, but anyway:
While constructing the hash value, we can check for a match in the set of string hashes. Aka, the current hash value. The hash function/code is usually implemented as a loop and inside that loop we can insert our quick look up.
Of course, we must pick m
to have the maximum string length from the set of strings.
Update: From Wikipedia,
[...]
for i from 1 to n-m+1
if hs ∈ hsubs
if s[i..i+m-1] = a substring with hash hs
return i
hs := hash(s[i+1..i+m]) // <---- calculating current hash
[...]
We calculate current hash in m
steps. On each step there is a temporary hash value that we can look up ( O(1) complexity ) in the set of hashes. All hashes will have the same size, ie 32 bit.
Update 2: an amortized (average) O(n) time complexity ?
Above I said that m
must have the maximum string length. It turns out that we can exploit the opposite.
With hashing for shifting substring search and a fixed m
size we can achieve O(n) complexity.
If we have variable length strings we can set m
to the minimum string length. Additionally, in the set of hashes we don't associate a hash with the whole string but with the first m-characters of it.
Now, while searching the text we check if the current hash is in the hash set and we examine the associated strings for a match.
This technique will increase the false alarms but on average it has O(n) time complexity.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…