GATE CSEIT/Database/Collision in Hashing Hashing collision.

GATE CSEIT/Database/Collision in Hashing Hashing collision.

Hashing and Collision Handling in Databases (GATE CSE/IT)

 What is Hashing?

Hashing is a technique used in database indexing & data structures to store and retrieve data efficiently. It uses a hash function to convert keys into a fixed-size address (hash value).

Formula for Hash Function:

h(K)=Kmod  Mh(K) = K \mod M

where:

  • K = Key
  • M = Number of available slots (bucket size)

Example: If we insert key 25 into a hash table of size 10 using h(K) = K mod 10, we get:

h(25)=25mod  10=5h(25) = 25 \mod 10 = 5

So, 25 is stored in index 5.

 What is Collision in Hashing?

A collision occurs when two keys produce the same hash value.
For example, if 25 and 35 both hash to index 5, we have a collision.

Why Collisions Occur?

  1. Limited number of slots in the hash table.
  2. Different keys hashing to the same index.
  3. Poor hash function choice.

 Collision Handling Techniques

 Open Addressing (Probing Techniques)

The key is placed in an alternative location if a collision occurs.

(a) Linear Probing

 If a collision occurs, check the next available slot (i.e., h(K)+ih(K) + i).
Example: If 25 and 35 both hash to index 5, store 35 in 6.
Issue: Clustering (many keys get stored together).

Formula:

h(K,i)=(h(K)+i)mod  Mh(K, i) = (h(K) + i) \mod M

(b) Quadratic Probing

 Checks the next available slot in a quadratic sequence ().
Example: If 25 hashes to 5, the next checked slots are (5+1², 5+2², 5+3², …).
Reduces clustering but may lead to secondary clustering.

Formula:

h(K,i)=(h(K)+i2)mod  Mh(K, i) = (h(K) + i^2) \mod M

(c) Double Hashing

 Uses two hash functions to reduce clustering.
Example: If h1(K) = 5, use a second function (h2(K)) to determine the next slot.

Formula:

h(K,i)=(h1(K)+i×h2(K))mod  Mh(K, i) = (h1(K) + i \times h2(K)) \mod M

 Separate Chaining (Linked List in Buckets)

 Each slot in the hash table stores a linked list of values.
 If multiple keys hash to the same index, they are stored in a linked list at that index.
Example: If 25 and 35 both hash to 5, they are stored in a linked list at index 5.

Advantage: Efficient for large hash tables.
Disadvantage: Increased memory usage due to linked lists.

 Choosing the Best Collision Resolution Technique

Method Pros Cons
Linear Probing Simple, easy to implement Causes clustering
Quadratic Probing Reduces clustering May not find empty slots
Double Hashing Avoids clustering Slightly complex
Separate Chaining Handles collisions well Requires extra memory

 Example GATE Question (Hashing Collision Handling)

Question: A hash function is defined as h(K) = K mod 10. Given keys 23, 43, 13, 27, 33, insert them using linear probing in a hash table of size 10.

Solution:
Step 1: Compute hash values.

Key Hash Value (h(K) = K mod 10) Placement (Using Linear Probing)
23 23 mod 10 = 3 Slot 3
43 43 mod 10 = 3 Collision → Next slot 4
13 13 mod 10 = 3 Collision → Next slot 5
27 27 mod 10 = 7 Slot 7
33 33 mod 10 = 3 Collision → Next slot 6

Final Hash Table:

Index: 0 1 2 3 4 5 6 7 8 9
Keys: - - - 23 43 13 33 27 - -

 Conclusion

Hashing is essential for fast data access in databases.
Collisions occur when multiple keys hash to the same index.
Various techniques like open addressing & separate chaining help resolve collisions.
Choosing the right method depends on efficiency, memory, and clustering avoidance.

Would you like more examples or Python code implementations?

Leave a Reply

Your email address will not be published. Required fields are marked *

error: