AAD- Optimal Binary Search Tree With Practical Example ( for Deeper Understanding) Part 25.
AAD- Optimal Binary Search Tree With Practical Example ( for Deeper Understanding) Part 25.
Contents [hide]
- 1 Optimal Binary Search Tree (OBST) – Advanced Analysis & Practical Example
- 2 2. Why Do We Need OBST?
- 3 3. Problem Definition
- 4 4. Dynamic Programming Approach for OBST
- 5 5. Practical Example
- 6 6. Constructing the Optimal BST
- 7 7. Applications of OBST
- 8 8. Conclusion
- 9 AAD- Optimal Binary Search Tree With Practical Example ( for Deeper Understanding) Part 25.
- 10 Optimal Decision Trees via Dynamic Programming and …
- 11 Optimal binary search trees
- 12 Optimal Binary Search Tree
Optimal Binary Search Tree (OBST) – Advanced Analysis & Practical Example
1. Introduction to Optimal Binary Search Tree (OBST)
An Optimal Binary Search Tree (OBST) is a specialized Binary Search Tree (BST) that is structured in a way that minimizes the expected search cost (or total weighted path length).
When searching for elements in a BST, the cost depends on how frequently each element is accessed. Instead of constructing a standard BST, OBST minimizes the average search time based on given probabilities of searching for keys.
2. Why Do We Need OBST?
A regular BST does not always guarantee an optimal search time. If the tree is unbalanced, the search cost may be high.
- OBST ensures that frequently accessed elements are placed closer to the root, reducing search cost on average.
- This is useful in database indexing, compiler design (syntax trees), and information retrieval systems.
3. Problem Definition
Given n keys K1,K2,…,KnK_1, K_2, …, K_n in sorted order, and their respective probabilities of access:
- Search probabilities: p1,p2,…,pnp_1, p_2, …, p_n (probability of searching each key)
- Dummy keys’ probabilities: q0,q1,…,qnq_0, q_1, …, q_n (probability of searching between keys or missing elements)
The goal is to construct a binary search tree that minimizes the expected search cost:
C(T)=∑i=1n(Depth(Ki)×Probability(Ki))C(T) = \sum_{i=1}^{n} (Depth(K_i) \times Probability(K_i))
where Depth(K_i) is the level of node KiK_i in the tree.
4. Dynamic Programming Approach for OBST
We define:
- e[i][j]e[i][j] → Expected search cost for subtree containing keys KiK_i to KjK_j
- w[i][j]w[i][j] → Sum of all probabilities from KiK_i to KjK_j
- root[i][j]root[i][j] → Root of the OBST for KiK_i to KjK_j
Recurrence Relation:
e[i][j]=minr∈[i,j](e[i][r−1]+e[r+1][j]+w[i][j])e[i][j] = \min_{r \in [i,j]} (e[i][r-1] + e[r+1][j] + w[i][j])
where rr is the root node considered at that step.
5. Practical Example
Given Data:
Keys: K1, K2, K3
Sorted Keys: (10, 20, 30)
Key | K1 (10) | K2 (20) | K3 (30) |
---|---|---|---|
Probability pp | 0.3 | 0.4 | 0.3 |
Dummy qq | 0.1 | 0.1 | 0.1 |
Step 1: Compute Weight Matrix w[i][j]w[i][j]
w[i][j]=pi+pi+1+…+pj+qi−1+qjw[i][j] = p_i + p_{i+1} + … + p_j + q_{i-1} + q_j
i → j | w[i][i] | w[i][i+1] | w[i][i+2] |
---|---|---|---|
w[1][1]w[1][1] = 0.4 | w[1][2]w[1][2] = 0.9 | w[1][3]w[1][3] = 1.3 | |
w[2][2]w[2][2] = 0.5 | w[2][3]w[2][3] = 0.9 | ||
w[3][3]w[3][3] = 0.4 |
Step 2: Compute Expected Cost e[i][j]e[i][j]
Base Case:
e[i][i]=w[i][i]e[i][i] = w[i][i]
For multiple elements, choose rr (root) that minimizes:
e[i][j]=e[i][r−1]+e[r+1][j]+w[i][j]e[i][j] = e[i][r-1] + e[r+1][j] + w[i][j]
i → j | e[i][i] | e[i][i+1] | e[i][i+2] |
---|---|---|---|
e[1][1]e[1][1] = 0.4 | e[1][2]e[1][2] = 0.9 (root = K1) | e[1][3]e[1][3] = 1.3 (root = K2) | |
e[2][2]e[2][2] = 0.5 | e[2][3]e[2][3] = 0.9 (root = K2) | ||
e[3][3]e[3][3] = 0.4 |
6. Constructing the Optimal BST
From the table:
- K2 (20) becomes the root (it minimizes expected cost).
- K1 (10) is the left child.
- K3 (30) is the right child.
Final Optimal BST Structure:
7. Applications of OBST
Database Indexing (e.g., indexing frequently accessed records efficiently).
Compiler Design (syntax trees in parsers).
Artificial Intelligence (decision trees in machine learning).
8. Conclusion
The Optimal Binary Search Tree (OBST) ensures efficient searching by minimizing the expected search cost using dynamic programming. This is especially useful in applications where search frequencies vary significantly.
Would you like a code implementation of OBST in C++/Python?