..

初步了解 non-cryptographic hash

hash 方法的定义 1


分类

0) cryptographic hash 1) non-cryptographic hash


好的标准是什么

A good hash function satisfies two basic properties:
1) it should be very fast to compute;
2) it should minimize duplication of output values (collisions).


常用算法

1) hashing by division

h(key) = key mod table_size 
  • Since it requires only a single division operation, hashing by division is quite fast.
  • When using the division method, we usually avoid certain values of table_size like table_size should not be a power of a number suppose r, since if table_size = r^p, then h(key) is just the p lowest-order bits of key.
  • A prime not too close to an exact power of 2 is often good choice for table_size.

2) hashing by multiplication

h(k) = floor (m * frac (k * c))
  • floor(x), yields the integer part of the real number x, and frac(x) yields the fractional part.
  • An advantage of the multiplication method is that the value of m is not critical, we typically choose it to be a power of 2 (m = 2p for some integer p).
  • the word size of the machine is w bits and that key fits into a single word.
  • c to be a fraction of the form s / (2w), where s is an integer in the range 0 < s < 2w.

实例 2

Suppose k = 123456, p = 14,
m = 2^14 = 16384, and w = 32.
Adapting Knuth’s suggestion, c to be fraction of the form s / 2^32.
Then key * s = 327706022297664 = (76300 * 2^32) + 17612864,
The 14 most significant bits of r0 yield the value h(key) = 67.

17612864 = 0x010CC040 =

0000 0001 0000 1100 1100 0000 0100 0000
Most significant 14 bits of that is

0000 0001 0000 11

测试下

设计完后, 用 smhasher 测试一下。 3


  1. A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing. 

  2. h(key)=67

  3. smhasher