Suppose we wish to search a linked list of length n, where each element contains a key k along with a hash value h(k). Each key is a long character string. How might we take advantage of the hash values when searching the list for an element with a given key?
首先计算出给定关键字的hash值. 对列表中的每个元素,先验证hash值对不对,再进行字符串的比较.
Suppose that a string of r characters is hashed into m slots by treating it as a radix-128 number and then using the division method. The number m is easily represented as a 32-bit computer word, but the string of r characters, treated as a radix-128 number, takes many words. How can we apply the division method to compute the hash value of the character string without using more than a constant number of words of storage outside the string itself?
One easy approach would adding the values of each character to get a sum and then take a modulo 128. Another way would be using n-degree equation with each character value acting as a co-efficient for the nth term. Example: S[0] * x^n + S[1] * x^(n-1)+ …. + S[n-1], for better result substitute x = 33 or 37 or 39. This is the famous Horner’s method for finding a hash value.
Consider a version of the division method in which h(k) = k mod m, where m = 2p - 1 and k is a character string interpreted in radix 2p. Show that if string x can be derived from string y by permuting its characters, then x and y hash to the same value. Give an example of an application in which this property would be undesirable in a hash function.
有一个很简单的数论知识.先举个例子
3 * 128^3 mod 127 = 3 * 128^2 * (128 mod 127) mod 127 = 3 * 128^2 mod 127 = 3 mod 127 = 3
无论怎么交换字符串的order,radix的影响都会消失. 因为2p^n mod 2p-1 === 1.
Consider a hash table of size m = 1000 and a corresponding hash function h(k) = ⌊m(k A mod 1)⌋ for  ![](http://latex.codecogs.com/gif.latex? A = \frac{\sqrt{5}-1}{2}) . Compute the locations to which the keys 61, 62, 63, 64, and 65 are mapped.
key | value |
---|---|
61 | 700 |
62 | 318 |
63 | 936 |
64 | 554 |
65 | 172 |
Define a family of hash functions from a finite set U to a finite set B to be ϵ-universal if for all pairs of distinct elements k and l in U,
![](http://latex.codecogs.com/gif.latex?\\Pr\\{h\(k\) = h(l)\} \le \epsilon)
where the probability is over the choice of the hash function h drawn at random from the family . Show that an ϵ-universal family of hash functions must have
![](http://latex.codecogs.com/gif.latex?\\epsilon \ge \frac{1}{|B|} - \frac{1}{|U|})
设 i = h(k), 且 h(x) = i 的解有 si 个, (类比散列表即是槽 i 的链表长度为 si), 则 h(k) = i 且 h(l) = i 的概率为 si * (si - 1) / (|U| * |U|) 因此, 有
![](http://latex.codecogs.com/gif.latex?\\Pr\\{h\(k\) = h(l)\} = \frac{\sum_{i=1}^{|B|} s_i(s_i-1)}{|U|^2})
又因为
![](http://latex.codecogs.com/gif.latex?\\sum_{i=1}^{|B|} s_i(s_i-1) = \sum_{i=1}^{|B|} s_i^2 - \sum_{i=1}^{|B|} s_i)
![](http://latex.codecogs.com/gif.latex?\(\\sum_{i=1}^{|B|} s_i)^2 = \sum_{i=1}^{|B|} s_i^2 + 2\sum_{i=1}^{|B|-1} \sum_{j=i+1}^{|B|} s_i s_j \le |B|\sum_{i=1}^{|B|} s_i^2)
由以上两式可得
![](http://latex.codecogs.com/gif.latex?\\sum_{i=1}^{|B|} s_i(s_i-1) = \sum_{i=1}^{|B|} s_i^2 - \sum_{i=1}^{|B|} s_i \ge \frac{(\sum_{i=1}^{|B|} s_i)^2}{|B|} - \sum_{i=1}^{|B|} s_i = \frac{|U|^2}{|B|} - |U|)
因此
![](http://latex.codecogs.com/gif.latex?\\epsilon \ge \Pr\{h(k) = h(l)\} = \frac{\sum_{i=1}^{|B|} s_i(s_i-1)}{|U|^2} \ge \frac{\frac{|U|^2}{|B|} - |U|}{|U|^2} = \frac{1}{|B|} - \frac{1}{|U|})
Let U be the set of n-tuples of values drawn from Zp, and let B = Zp, where p is prime. Define the hash function hb : U → B for b in Zp on an input n-tuple [a0, a1, ..., an-1] from U as
![](http://latex.codecogs.com/gif.latex?h_b\(\\langle a_0, a_1, \ldots, a_{n-1} \rangle) = (\sum_{j=0}^{n-1} a_j b^j})modp)
and let H={hb:b∈Zp}. Argue that H is ((n−1)/p)-universal according to the definition of ϵ-universal in Exercise 11.3-5. (Hint: See Exercise 31.4-4.)
首先用归纳法证明以下结论: 对于取自集合 的 n 元组 ![](http://latex.codecogs.com/gif.latex?[a_0, a_1, ... , a_n]), 满足
![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=0}^{n}a_j x^j)modp = 0)
的 x (x ∈ Zp) 最多有 n 个。
- 当 n = 1 时, 假设有 x = v 与 x = w (不失一般性, v > w) 同时满足 ![](http://latex.codecogs.com/gif.latex?\(a_0 + a_1 x)modp = 0), 则有 ![](http://latex.codecogs.com/gif.latex?a_1 (v - w)modp = 0), 其中 a_1 ∈ Zp, (v - w) ∈ Zp, p 为素数, 显然不成立。因此, 当 n = 1 时, 结论成立。
- 当 n = k 时, 至多有 k 个 x (x ∈ Zp) 满足
![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=0}^{k}a_j x^j)modp = 0) ... (0)
下面证明当 n = k + 1 时, 结论成立。
不妨设
![](http://latex.codecogs.com/gif.latex?\\sum_{j=0}^{k + 1}a_j v^j\ \equiv \sum_{j=0}^{k + 1}a_j w^j(modp))
其中 v > w, 得到
![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=1}^{k + 1}a_j (v^j - w^j))modp = (\sum_{j=1}^{k + 1}a_j (v - w) \sum_{i=0}^{j-1} v_i w_{j - i} )modp = 0)
由于(v - w) < p, 不影响取余结果, 上式可变形为
![](http://latex.codecogs.com/gif.latex?\( \sum_{j = 0}^{k} w^j ( \sum_{i=j+1}^{k + 1} a_i v^{i - j - 1} ) )modp = 0)
更进一步, 变为
![](http://latex.codecogs.com/gif.latex?\( \sum_{j = 0}^{k} w^j ( (\sum_{i=j+1}^{k + 1} a_i v^{i - j - 1})modp ) )modp = 0)
令 ![](http://latex.codecogs.com/gif.latex?c_j = (\sum_{i=j+1}^{k + 1} a_i v^{i - j - 1})modp), 得到更直观的形式
![](http://latex.codecogs.com/gif.latex?\( \sum_{j = 0}^{k} c_j w^j )modp = 0) .... (1)
上式说明, 当确定一个 v 可以确定一组取自 的序列 ![](http://latex.codecogs.com/gif.latex?[c_0, c_1, ... , c_k]), 根据(0)式, (1)式的解最多有k个。 即: mod p 后, 和 v 具有相同余数的 x 有 k 个 也就是说:对一组确定的 ![](http://latex.codecogs.com/gif.latex?[a_0, a_1, ... , a_{k+1}]), 具有相同余数的 x 最多有 (k + 1) 个, 因此余数为0的 x 也最多只有 (k + 1) 个
命题得证。
下面证明原始的问题:
首先, 两个不同 n 元组 ![](http://latex.codecogs.com/gif.latex?[a_0, a_1, ... , a_{n-1}]) 与 ![](http://latex.codecogs.com/gif.latex?[a'_0, a'1, ... , a'{n-1}]) 散列结果相同等价于
![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=0}^{n-1} (a_j - a'_j) b^j)modp = 0) ... (2)
由于取模运算,![](http://latex.codecogs.com/gif.latex?\(a_j - a'_j)) 可以等价的换为 中的元素, 根据上面证得的结果, (2)式最多有 (n - 1) 个 b 满足要求 即, H 中最多有 (n - 1) 个散列函数会导致两个不同输入散列结果相同, 因此
![](http://latex.codecogs.com/gif.latex?\\Pr\\{h\(k\) = h(l)\} \le \frac{n - 1}{p})
根据定义, H 是 ![](http://latex.codecogs.com/gif.latex?\\frac{n - 1}{p}) 全域的
Follow @louis1992 on github to help finish this task.