Skip to content

Latest commit

 

History

History
150 lines (93 loc) · 7.59 KB

11.3.md

File metadata and controls

150 lines (93 loc) · 7.59 KB

Exercises 11.3-1


Suppose we wish to search a linked list of length n, where each element contains a key k along with a hash value h(k). Each key is a long character string. How might we take advantage of the hash values when searching the list for an element with a given key?

Answer

首先计算出给定关键字的hash值. 对列表中的每个元素,先验证hash值对不对,再进行字符串的比较.

Exercises 11.3-2


Suppose that a string of r characters is hashed into m slots by treating it as a radix-128 number and then using the division method. The number m is easily represented as a 32-bit computer word, but the string of r characters, treated as a radix-128 number, takes many words. How can we apply the division method to compute the hash value of the character string without using more than a constant number of words of storage outside the string itself?

Answer

One easy approach would adding the values of each character to get a sum and then take a modulo 128. Another way would be using n-degree equation with each character value acting as a co-efficient for the nth term. Example: S[0] * x^n + S[1] * x^(n-1)+ …. + S[n-1], for better result substitute x = 33 or 37 or 39. This is the famous Horner’s method for finding a hash value.

Exercises 11.3-3


Consider a version of the division method in which h(k) = k mod m, where m = 2p - 1 and k is a character string interpreted in radix 2p. Show that if string x can be derived from string y by permuting its characters, then x and y hash to the same value. Give an example of an application in which this property would be undesirable in a hash function.

Answer

有一个很简单的数论知识.先举个例子

3 * 128^3 mod 127 = 3 * 128^2 * (128 mod 127) mod 127 = 3 * 128^2 mod 127 = 3 mod 127 = 3

无论怎么交换字符串的order,radix的影响都会消失. 因为2p^n mod 2p-1 === 1.

Exercises 11.3-4


Consider a hash table of size m = 1000 and a corresponding hash function h(k) = ⌊m(k A mod 1)⌋ for  ![](http://latex.codecogs.com/gif.latex? A = \frac{\sqrt{5}-1}{2}) . Compute the locations to which the keys 61, 62, 63, 64, and 65 are mapped.

Answer

key value
61 700
62 318
63 936
64 554
65 172

Exercises 11.3-5


Define a family of hash functions from a finite set U to a finite set B to be ϵ-universal if for all pairs of distinct elements k and l in U,

![](http://latex.codecogs.com/gif.latex?\\Pr\\{h\(k\) = h(l)\} \le \epsilon)

where the probability is over the choice of the hash function h drawn at random from the family . Show that an ϵ-universal family of hash functions must have

![](http://latex.codecogs.com/gif.latex?\\epsilon \ge \frac{1}{|B|} - \frac{1}{|U|})

Answer

设 i = h(k), 且 h(x) = i 的解有 si 个, (类比散列表即是槽 i 的链表长度为 si), 则 h(k) = i 且 h(l) = i 的概率为 si * (si - 1) / (|U| * |U|) 因此, 有

![](http://latex.codecogs.com/gif.latex?\\Pr\\{h\(k\) = h(l)\} = \frac{\sum_{i=1}^{|B|} s_i(s_i-1)}{|U|^2})

又因为

![](http://latex.codecogs.com/gif.latex?\\sum_{i=1}^{|B|} s_i(s_i-1) = \sum_{i=1}^{|B|} s_i^2 - \sum_{i=1}^{|B|} s_i)

![](http://latex.codecogs.com/gif.latex?\(\\sum_{i=1}^{|B|} s_i)^2 = \sum_{i=1}^{|B|} s_i^2 + 2\sum_{i=1}^{|B|-1} \sum_{j=i+1}^{|B|} s_i s_j \le |B|\sum_{i=1}^{|B|} s_i^2)

由以上两式可得

![](http://latex.codecogs.com/gif.latex?\\sum_{i=1}^{|B|} s_i(s_i-1) = \sum_{i=1}^{|B|} s_i^2 - \sum_{i=1}^{|B|} s_i \ge \frac{(\sum_{i=1}^{|B|} s_i)^2}{|B|} - \sum_{i=1}^{|B|} s_i = \frac{|U|^2}{|B|} - |U|)

因此

![](http://latex.codecogs.com/gif.latex?\\epsilon \ge \Pr\{h(k) = h(l)\} = \frac{\sum_{i=1}^{|B|} s_i(s_i-1)}{|U|^2} \ge \frac{\frac{|U|^2}{|B|} - |U|}{|U|^2} = \frac{1}{|B|} - \frac{1}{|U|})

Exercises 11.3-6


Let U be the set of n-tuples of values drawn from Zp, and let B = Zp, where p is prime. Define the hash function hb : U → B for b in Zp on an input n-tuple [a0, a1, ..., an-1] from U as

![](http://latex.codecogs.com/gif.latex?h_b\(\\langle a_0, a_1, \ldots, a_{n-1} \rangle) = (\sum_{j=0}^{n-1} a_j b^j})modp)

and let H={hb:b∈Zp}. Argue that H is ((n−1)/p)-universal according to the definition of ϵ-universal in Exercise 11.3-5. (Hint: See Exercise 31.4-4.)

Answer

首先用归纳法证明以下结论: 对于取自集合 的 n 元组 ![](http://latex.codecogs.com/gif.latex?[a_0, a_1, ... , a_n]), 满足

![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=0}^{n}a_j x^j)modp = 0)

的 x (x ∈ Zp) 最多有 n 个。

  1. 当 n = 1 时, 假设有 x = v 与 x = w (不失一般性, v > w) 同时满足 ![](http://latex.codecogs.com/gif.latex?\(a_0 + a_1 x)modp = 0), 则有 ![](http://latex.codecogs.com/gif.latex?a_1 (v - w)modp = 0), 其中 a_1 ∈ Zp, (v - w) ∈ Zp, p 为素数, 显然不成立。因此, 当 n = 1 时, 结论成立。
  2. 当 n = k 时, 至多有 k 个 x (x ∈ Zp) 满足

![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=0}^{k}a_j x^j)modp = 0) ... (0)

下面证明当 n = k + 1 时, 结论成立。

不妨设

![](http://latex.codecogs.com/gif.latex?\\sum_{j=0}^{k + 1}a_j v^j\ \equiv \sum_{j=0}^{k + 1}a_j w^j(modp))

其中 v > w, 得到

![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=1}^{k + 1}a_j (v^j - w^j))modp = (\sum_{j=1}^{k + 1}a_j (v - w) \sum_{i=0}^{j-1} v_i w_{j - i} )modp = 0)

由于(v - w) < p, 不影响取余结果, 上式可变形为

![](http://latex.codecogs.com/gif.latex?\( \sum_{j = 0}^{k} w^j ( \sum_{i=j+1}^{k + 1} a_i v^{i - j - 1} ) )modp = 0)

更进一步, 变为

![](http://latex.codecogs.com/gif.latex?\( \sum_{j = 0}^{k} w^j ( (\sum_{i=j+1}^{k + 1} a_i v^{i - j - 1})modp ) )modp = 0)

令 ![](http://latex.codecogs.com/gif.latex?c_j = (\sum_{i=j+1}^{k + 1} a_i v^{i - j - 1})modp), 得到更直观的形式

![](http://latex.codecogs.com/gif.latex?\( \sum_{j = 0}^{k} c_j w^j )modp = 0) .... (1)

上式说明, 当确定一个 v 可以确定一组取自 的序列 ![](http://latex.codecogs.com/gif.latex?[c_0, c_1, ... , c_k]), 根据(0)式, (1)式的解最多有k个。 即: mod p 后, 和 v 具有相同余数的 x 有 k 个 也就是说:对一组确定的 ![](http://latex.codecogs.com/gif.latex?[a_0, a_1, ... , a_{k+1}]), 具有相同余数的 x 最多有 (k + 1) 个, 因此余数为0的 x 也最多只有 (k + 1) 个

命题得证。

下面证明原始的问题:

首先, 两个不同 n 元组 ![](http://latex.codecogs.com/gif.latex?[a_0, a_1, ... , a_{n-1}]) 与 ![](http://latex.codecogs.com/gif.latex?[a'_0, a'1, ... , a'{n-1}]) 散列结果相同等价于

![](http://latex.codecogs.com/gif.latex?\(\\sum_{j=0}^{n-1} (a_j - a'_j) b^j)modp = 0) ... (2)

由于取模运算,![](http://latex.codecogs.com/gif.latex?\(a_j - a'_j)) 可以等价的换为 中的元素, 根据上面证得的结果, (2)式最多有 (n - 1) 个 b 满足要求 即, H 中最多有 (n - 1) 个散列函数会导致两个不同输入散列结果相同, 因此

![](http://latex.codecogs.com/gif.latex?\\Pr\\{h\(k\) = h(l)\} \le \frac{n - 1}{p})

根据定义, H 是 ![](http://latex.codecogs.com/gif.latex?\\frac{n - 1}{p}) 全域的


Follow @louis1992 on github to help finish this task.