GATE CSEIT/Database/Collision in Hashing Hashing collision.

GATE CSEIT/Database/Collision in Hashing Hashing collision.

play-rounded-fill play-rounded-outline play-sharp-fill play-sharp-outline
pause-sharp-outline pause-sharp-fill pause-rounded-outline pause-rounded-fill
00:00

Hashing and Collision Handling in Databases (GATE CSE/IT)

 What is Hashing?

Hashing is a technique used in database indexing & data structures to store and retrieve data efficiently. It uses a hash function to convert keys into a fixed-size address (hash value).



Formula for Hash Function:

h(K)=Kmod  Mh(K) = K \mod M

where:

  • K = Key
  • M = Number of available slots (bucket size)

Example: If we insert key 25 into a hash table of size 10 using h(K) = K mod 10, we get:

h(25)=25mod  10=5h(25) = 25 \mod 10 = 5

So, 25 is stored in index 5.

 What is Collision in Hashing?

A collision occurs when two keys produce the same hash value.
For example, if 25 and 35 both hash to index 5, we have a collision.

Why Collisions Occur?

  1. Limited number of slots in the hash table.
  2. Different keys hashing to the same index.
  3. Poor hash function choice.

 Collision Handling Techniques

 Open Addressing (Probing Techniques)

The key is placed in an alternative location if a collision occurs.

(a) Linear Probing

 If a collision occurs, check the next available slot (i.e., h(K)+ih(K) + i).
Example: If 25 and 35 both hash to index 5, store 35 in 6.
Issue: Clustering (many keys get stored together).

Formula:

h(K,i)=(h(K)+i)mod  Mh(K, i) = (h(K) + i) \mod M

(b) Quadratic Probing

 Checks the next available slot in a quadratic sequence ().
Example: If 25 hashes to 5, the next checked slots are (5+1², 5+2², 5+3², …).
Reduces clustering but may lead to secondary clustering.

Formula:

h(K,i)=(h(K)+i2)mod  Mh(K, i) = (h(K) + i^2) \mod M

(c) Double Hashing

 Uses two hash functions to reduce clustering.
Example: If h1(K) = 5, use a second function (h2(K)) to determine the next slot.

Formula:

h(K,i)=(h1(K)+i×h2(K))mod  Mh(K, i) = (h1(K) + i \times h2(K)) \mod M

 Separate Chaining (Linked List in Buckets)

 Each slot in the hash table stores a linked list of values.
 If multiple keys hash to the same index, they are stored in a linked list at that index.
Example: If 25 and 35 both hash to 5, they are stored in a linked list at index 5.

Advantage: Efficient for large hash tables.
Disadvantage: Increased memory usage due to linked lists.

 Choosing the Best Collision Resolution Technique

Method Pros Cons
Linear Probing Simple, easy to implement Causes clustering
Quadratic Probing Reduces clustering May not find empty slots
Double Hashing Avoids clustering Slightly complex
Separate Chaining Handles collisions well Requires extra memory

 Example GATE Question (Hashing Collision Handling)

Question: A hash function is defined as h(K) = K mod 10. Given keys 23, 43, 13, 27, 33, insert them using linear probing in a hash table of size 10.

Solution:
Step 1: Compute hash values.

Key Hash Value (h(K) = K mod 10) Placement (Using Linear Probing)
23 23 mod 10 = 3 Slot 3
43 43 mod 10 = 3 Collision → Next slot 4
13 13 mod 10 = 3 Collision → Next slot 5
27 27 mod 10 = 7 Slot 7
33 33 mod 10 = 3 Collision → Next slot 6

Final Hash Table:

Index: 0 1 2 3 4 5 6 7 8 9
Keys: - - - 23 43 13 33 27 - -

 Conclusion

Hashing is essential for fast data access in databases.
Collisions occur when multiple keys hash to the same index.
Various techniques like open addressing & separate chaining help resolve collisions.
Choosing the right method depends on efficiency, memory, and clustering avoidance.

Would you like more examples or Python code implementations?

Here’s a focused explanation on Collisions in Hashing, tailored for GATE CSE/IT – Databases:


🔁 Collision in Hashing

🎯 What is a Collision?

A collision occurs when two different keys hash to the same index in the hash table.

Let h(k1) = h(k2) but k1 ≠ k2 → this is a collision.

Hash collisions are inevitable due to the pigeonhole principle when:

  • You have more possible input keys than table slots (buckets).


🧠 Why Collisions Happen

  • The hash function maps many possible keys into a limited number of table indices.

  • The same index may be assigned to multiple keys.


🛠️ Collision Resolution Techniques

To handle collisions, we use collision resolution strategies. There are two main categories:


1. Open Hashing (Separate Chaining)

  • Each slot in the hash table holds a linked list (chain) of all elements whose keys hash to the same index.

  • No probing is needed; simply append to the list.

Example:
If h(13) = h(27) = 6, both keys are stored in a linked list at index 6.

Pros:

  • Simple to implement.

  • Works well even when the load factor > 1.

Cons:

  • Uses extra memory for pointers.

  • Search time can increase with long chains.


2. Closed Hashing (Open Addressing)

All elements are stored directly in the hash table. If a collision occurs, the algorithm searches for another open slot.

a) Linear Probing

h(k, i) = (h(k) + i) mod m

  • Check next slots one by one until an empty one is found.

b) Quadratic Probing

h(k, i) = (h(k) + c1*i + c2*i²) mod m

  • Avoids clustering by probing in a non-linear pattern.

c) Double Hashing

h(k, i) = (h1(k) + i * h2(k)) mod m

  • Uses a second hash function to compute probe steps.

Pros:

  • No extra memory for lists.

  • Better cache performance.

Cons:

  • More complex deletion.

  • Performance degrades as load factor increases.


📏 Load Factor (α)

α = n / m

  • n: number of elements stored

  • m: number of buckets

For open addressing, performance drops sharply as α → 1.


🧪 Example (GATE Level)

Keys: 10, 22, 31, 4, 15, 28, 17, 88, 59
Hash function: h(k) = k mod 9

Using Linear Probing, fill a table of size 9:

lua
h(10) = 1
h(22) = 4
h(31) = 4 → collision → next slot → 5
h(4) = 4 → collision → next free slot → 6
...

This is a typical GATE-style question.


📘 GATE Tips

  • Understand each collision resolution method.

  • Practice time complexity in best/average/worst cases.

  • Know trade-offs between chaining and open addressing.

  • Double hashing is often asked due to better performance.


Would you like a PDF summary, MCQ practice, or a visual diagram of collision resolution?

GATE CSEIT/Database/Collision in Hashing Hashing collision.

Introduction to Programming and Data Structures – Hashing

GATEONLINE CLASSES ON DATA STRUCTURES



Leave a Reply

Your email address will not be published. Required fields are marked *

error: