Tuesday, November 2, 2010

Notes on Hashing

Notes on Hashing

What is hashing
Hashing is a method for storing and retrieving records from a database. It lets you insert, delete, and search for records based on a search key value.
A hash system stores records in an array called a hash table, which we will call HT. Hashing works by performing a computation on a search key K in a way that is intended to identify the position in HT that contains the record with key K. The function that does this calculation is called the hash function.
       Different types of Hash function can be implemented . They are :
          a. Division Remainder method
          b. Middle square method
          c. Folding method

Division Remainder method
Division Remainder method is the simplest and widely used method, where it is assumed that the keys are non-negative integers. The index in the hash table is obtained by using the modulo operator(%)
In remainder division method, some member M divides the key and the remainder as index in the hash table, i.e.
H(K)= K % M
          On the above case H(K) is the hash function, K is a key value and M is a prime number.

Example : Consider a hash table with 7 slots and the keys are 49,123,68 and 90. Suppose we are taking the value of M=7

Middle square method
In Middle square method the key K is squared and then some particular bits from the middle of K2 ( for example 3rd and 4th LSB) are extracted th get the desired key.

H(K)= some bits of(K2)
          On the above case H(K) is the hash function, K is a key value.

Example : Consider a hash table with 7 slots and the keys are 49,123,68 and 90. Suppose we are taking the value of M=7



Folding method
In Folding method the key K is partitioned into a number of parts k1 , k2 , k3 , k4 , kr, where each part except possibly the last has the same number of digits as the required address. Then the parts are added together ignoring the last carry. Therefore the hash function

H(k) = k1 + k2+ k3 + k4 + kr
          On the above case H(K) is the hash function, k1 , k2 , k3 , k4 , kr  are the digit are extracted from key(k) and add all the digit and  ignore carry bit.

Example : Consider a hash table with 8 slots and the keys are 49,123 and 68. Here we are partitioning the key with the size one.

Collision
If two keys k1 and k2 are same bucket index L for same hash function, then it is called collision. To resolute the collision different collision resolution techniques are available. The collision resolution techniques can be classified into two broad categories:
          1. Open addressing
          2. Chaining

Open addressing
The general procedure for open addressing can be stated as
          1. All keys are stored in hash table only.
          2. When collisions occur , use a systematic procedure to store elements in free slot of the hash table.
Depends on systematic procedure open addressing can be of the following types.
          1. Linear Probing
          2. Quadratic probing
          3. Double hashing
          4. Rehashing

Linear Probing
Suppose two keys k1 and k2 are the same bucket number , then search the memory for an available place like :
          L, L+1, L+2, ………L+i
          Store the key in the first available place.
Quadratic probing
Suppose two keys k1 and k2 are the same bucket number , then search the memory for an available place like :
          L, L+12, L+22, ………L+i2
          Store the key in the first available place.

Rehashing
If any stage the hash table becomes full or overflow then it will be very difficult to find the free location for new key. In such situation create a new hash table which is double in length then the previous one. Then first scan the previous hash table and for each key calculate the new bucket index and store them into newly created hash table.

Chaining
The general procedure for chaining is :
          a. store all elements that has to the same slot in a link list.
          b. Store pointer to the head of the link list in the hash table slot.


1 comment:

Ramkumar said...

please email this procedure for universal hashing with example and explanation to
b.ramkumar.b24@gmail.com