GATE CSEIT/Database/Collision in Hashing Hashing collision.
GATE CSEIT/Database/Collision in Hashing Hashing collision.
Contents [hide]
- 0.1 Hashing and Collision Handling in Databases (GATE CSE/IT)
- 0.2 What is Hashing?
- 0.3 What is Collision in Hashing?
- 0.4 Collision Handling Techniques
- 0.5 Open Addressing (Probing Techniques)
- 0.6 Separate Chaining (Linked List in Buckets)
- 0.7 Choosing the Best Collision Resolution Technique
- 0.8 Example GATE Question (Hashing Collision Handling)
- 0.9 Conclusion
- 1
Collision in Hashing
- 2
Why Collisions Happen
- 3
Collision Resolution Techniques
- 4
Load Factor (α)
- 5
Example (GATE Level)
- 6
GATE Tips
Hashing and Collision Handling in Databases (GATE CSE/IT)
What is Hashing?
Hashing is a technique used in database indexing & data structures to store and retrieve data efficiently. It uses a hash function to convert keys into a fixed-size address (hash value).
Formula for Hash Function:
h(K)=Kmod Mh(K) = K \mod M
where:
- K = Key
- M = Number of available slots (bucket size)
Example: If we insert key 25 into a hash table of size 10 using h(K) = K mod 10
, we get:
h(25)=25mod 10=5h(25) = 25 \mod 10 = 5
So, 25 is stored in index 5.
What is Collision in Hashing?
A collision occurs when two keys produce the same hash value.
For example, if 25 and 35 both hash to index 5, we have a collision.
Why Collisions Occur?
- Limited number of slots in the hash table.
- Different keys hashing to the same index.
- Poor hash function choice.
Collision Handling Techniques
Open Addressing (Probing Techniques)
The key is placed in an alternative location if a collision occurs.
(a) Linear Probing
If a collision occurs, check the next available slot (i.e., h(K)+ih(K) + i).
Example: If 25
and 35
both hash to index 5, store 35
in 6.
Issue: Clustering (many keys get stored together).
Formula:
h(K,i)=(h(K)+i)mod Mh(K, i) = (h(K) + i) \mod M
(b) Quadratic Probing
Checks the next available slot in a quadratic sequence (i²
).
Example: If 25
hashes to 5, the next checked slots are (5+1², 5+2², 5+3², …).
Reduces clustering but may lead to secondary clustering.
Formula:
h(K,i)=(h(K)+i2)mod Mh(K, i) = (h(K) + i^2) \mod M
(c) Double Hashing
Uses two hash functions to reduce clustering.
Example: If h1(K) = 5
, use a second function (h2(K)
) to determine the next slot.
Formula:
h(K,i)=(h1(K)+i×h2(K))mod Mh(K, i) = (h1(K) + i \times h2(K)) \mod M
Separate Chaining (Linked List in Buckets)
Each slot in the hash table stores a linked list of values.
If multiple keys hash to the same index, they are stored in a linked list at that index.
Example: If 25
and 35
both hash to 5, they are stored in a linked list at index 5.
Advantage: Efficient for large hash tables.
Disadvantage: Increased memory usage due to linked lists.
Choosing the Best Collision Resolution Technique
Method | Pros | Cons |
---|---|---|
Linear Probing | Simple, easy to implement | Causes clustering |
Quadratic Probing | Reduces clustering | May not find empty slots |
Double Hashing | Avoids clustering | Slightly complex |
Separate Chaining | Handles collisions well | Requires extra memory |
Example GATE Question (Hashing Collision Handling)
Question: A hash function is defined as h(K) = K mod 10. Given keys 23, 43, 13, 27, 33, insert them using linear probing in a hash table of size 10.
Solution:
Step 1: Compute hash values.
Key | Hash Value (h(K) = K mod 10 ) |
Placement (Using Linear Probing) |
---|---|---|
23 | 23 mod 10 = 3 |
Slot 3 |
43 | 43 mod 10 = 3 |
Collision → Next slot 4 |
13 | 13 mod 10 = 3 |
Collision → Next slot 5 |
27 | 27 mod 10 = 7 |
Slot 7 |
33 | 33 mod 10 = 3 |
Collision → Next slot 6 |
Final Hash Table:
Conclusion
Hashing is essential for fast data access in databases.
Collisions occur when multiple keys hash to the same index.
Various techniques like open addressing & separate chaining help resolve collisions.
Choosing the right method depends on efficiency, memory, and clustering avoidance.
Would you like more examples or Python code implementations?
Here’s a focused explanation on Collisions in Hashing, tailored for GATE CSE/IT – Databases:
Collision in Hashing
What is a Collision?
A collision occurs when two different keys hash to the same index in the hash table.
Let
h(k1) = h(k2)
butk1 ≠ k2
→ this is a collision.
Hash collisions are inevitable due to the pigeonhole principle when:
-
You have more possible input keys than table slots (buckets).
Why Collisions Happen
-
The hash function maps many possible keys into a limited number of table indices.
-
The same index may be assigned to multiple keys.
Collision Resolution Techniques
To handle collisions, we use collision resolution strategies. There are two main categories:
1. Open Hashing (Separate Chaining)
-
Each slot in the hash table holds a linked list (chain) of all elements whose keys hash to the same index.
-
No probing is needed; simply append to the list.
Example:
If h(13) = h(27) = 6
, both keys are stored in a linked list at index 6.
Pros:
-
Simple to implement.
-
Works well even when the load factor > 1.
Cons:
-
Uses extra memory for pointers.
-
Search time can increase with long chains.
2. Closed Hashing (Open Addressing)
All elements are stored directly in the hash table. If a collision occurs, the algorithm searches for another open slot.
a) Linear Probing
h(k, i) = (h(k) + i) mod m
-
Check next slots one by one until an empty one is found.
b) Quadratic Probing
h(k, i) = (h(k) + c1*i + c2*i²) mod m
-
Avoids clustering by probing in a non-linear pattern.
c) Double Hashing
h(k, i) = (h1(k) + i * h2(k)) mod m
-
Uses a second hash function to compute probe steps.
Pros:
-
No extra memory for lists.
-
Better cache performance.
Cons:
-
More complex deletion.
-
Performance degrades as load factor increases.
Load Factor (α)
α = n / m
-
n
: number of elements stored -
m
: number of buckets
For open addressing, performance drops sharply as α → 1.
Example (GATE Level)
Keys: 10, 22, 31, 4, 15, 28, 17, 88, 59
Hash function: h(k) = k mod 9
Using Linear Probing, fill a table of size 9:
This is a typical GATE-style question.
GATE Tips
-
Understand each collision resolution method.
-
Practice time complexity in best/average/worst cases.
-
Know trade-offs between chaining and open addressing.
-
Double hashing is often asked due to better performance.
Would you like a PDF summary, MCQ practice, or a visual diagram of collision resolution?