GATE CSEIT/Database/Collision in Hashing Hashing collision.
GATE CSEIT/Database/Collision in Hashing Hashing collision.
Contents
- 1 Hashing and Collision Handling in Databases (GATE CSE/IT)
- 2 What is Hashing?
- 3 What is Collision in Hashing?
- 4 Collision Handling Techniques
- 5 Open Addressing (Probing Techniques)
- 6 Separate Chaining (Linked List in Buckets)
- 7 Choosing the Best Collision Resolution Technique
- 8 Example GATE Question (Hashing Collision Handling)
- 9 Conclusion
Hashing and Collision Handling in Databases (GATE CSE/IT)
What is Hashing?
Hashing is a technique used in database indexing & data structures to store and retrieve data efficiently. It uses a hash function to convert keys into a fixed-size address (hash value).
Formula for Hash Function:
h(K)=Kmod Mh(K) = K \mod M
where:
- K = Key
- M = Number of available slots (bucket size)
Example: If we insert key 25 into a hash table of size 10 using h(K) = K mod 10
, we get:
h(25)=25mod 10=5h(25) = 25 \mod 10 = 5
So, 25 is stored in index 5.
What is Collision in Hashing?
A collision occurs when two keys produce the same hash value.
For example, if 25 and 35 both hash to index 5, we have a collision.
Why Collisions Occur?
- Limited number of slots in the hash table.
- Different keys hashing to the same index.
- Poor hash function choice.
Collision Handling Techniques
Open Addressing (Probing Techniques)
The key is placed in an alternative location if a collision occurs.
(a) Linear Probing
If a collision occurs, check the next available slot (i.e., h(K)+ih(K) + i).
Example: If 25
and 35
both hash to index 5, store 35
in 6.
Issue: Clustering (many keys get stored together).
Formula:
h(K,i)=(h(K)+i)mod Mh(K, i) = (h(K) + i) \mod M
(b) Quadratic Probing
Checks the next available slot in a quadratic sequence (i²
).
Example: If 25
hashes to 5, the next checked slots are (5+1², 5+2², 5+3², …).
Reduces clustering but may lead to secondary clustering.
Formula:
h(K,i)=(h(K)+i2)mod Mh(K, i) = (h(K) + i^2) \mod M
(c) Double Hashing
Uses two hash functions to reduce clustering.
Example: If h1(K) = 5
, use a second function (h2(K)
) to determine the next slot.
Formula:
h(K,i)=(h1(K)+i×h2(K))mod Mh(K, i) = (h1(K) + i \times h2(K)) \mod M
Separate Chaining (Linked List in Buckets)
Each slot in the hash table stores a linked list of values.
If multiple keys hash to the same index, they are stored in a linked list at that index.
Example: If 25
and 35
both hash to 5, they are stored in a linked list at index 5.
Advantage: Efficient for large hash tables.
Disadvantage: Increased memory usage due to linked lists.
Choosing the Best Collision Resolution Technique
Method | Pros | Cons |
---|---|---|
Linear Probing | Simple, easy to implement | Causes clustering |
Quadratic Probing | Reduces clustering | May not find empty slots |
Double Hashing | Avoids clustering | Slightly complex |
Separate Chaining | Handles collisions well | Requires extra memory |
Example GATE Question (Hashing Collision Handling)
Question: A hash function is defined as h(K) = K mod 10. Given keys 23, 43, 13, 27, 33, insert them using linear probing in a hash table of size 10.
Solution:
Step 1: Compute hash values.
Key | Hash Value (h(K) = K mod 10 ) |
Placement (Using Linear Probing) |
---|---|---|
23 | 23 mod 10 = 3 |
Slot 3 |
43 | 43 mod 10 = 3 |
Collision → Next slot 4 |
13 | 13 mod 10 = 3 |
Collision → Next slot 5 |
27 | 27 mod 10 = 7 |
Slot 7 |
33 | 33 mod 10 = 3 |
Collision → Next slot 6 |
Final Hash Table:
Conclusion
Hashing is essential for fast data access in databases.
Collisions occur when multiple keys hash to the same index.
Various techniques like open addressing & separate chaining help resolve collisions.
Choosing the right method depends on efficiency, memory, and clustering avoidance.
Would you like more examples or Python code implementations?