Building a Binary Search Tree from scratch

Avinash
5 min readFeb 17, 2024

Hoping you already know what a binary tree is, the special thing about BST is the guarantee that

for any node in the BST, all nodes in the right subtree will be greater than the current node and all nodes in the left subtree will be smaller than the current node

Given this guarantee, the following operations are supported by the BST in O(log(n)) complexity — insert, search, delete, min, max. The most important of which is the ability to search inO(log(n)) which linear data structures or normal trees usually take O(n) .

True to OOP, we’ll represent the BST and the node of the tree as classes. The representation of a tree is just it’s root node and to start with it’s gonna be empty, while the node has the value it holds, the left and right children.

class TreeNode:

def __init__(self, value):
self.value = value
self.left = None
self.right = None
class BST:

def __init__(self):
self.root = None

Search

This is analogous to binary search in a sorted array. We will reject half of the search space in each iteration by comparing our current position w.r.t the value we’re searching for. At any point if the current node’s value is greater/smaller than the value we’re searching for, we can reject the left/right subtree respectively. This comes from the guarantee that BST gives.

searching is analogous to binary search. Here we search for 7
def search(self, val):

curr_node = self.root
while curr_node is not None:
if curr_node.value == val:
return curr_node
elif curr_node.value < val:
curr_node = curr_node.right
else:
curr_node = curr_node.left

# we reached a leaf node without encountering any node with value of val
return None

Min & Max

The left most node will me the minimum and the right most node will be the maximum. An intermediate node can’t be the minimum because there always exists a left child or a parent (to which it’s a left child) that will be smaller than itself. Similarly, an intermediate node can’t be the maximum because there always exists a right child or a parent (to which it’s a right child) that will be smaller than itself.

the min and max of the tree being the left most and the right most of the tree
@staticmethod
def _smallest(root):
if root is None: return None

curr_node = root
while curr_node.left is not None:
curr_node = curr_node.left

return curr_node

@staticmethod
def _largest(root):
if root is None: return None

curr_node = root
while curr_node.right is not None:
curr_node = curr_node.right

return curr_node

def get_min(self):
return BST._smallest(self.root)

def get_max(self):
return BST._largest(self.root)

Insert

Case 1 — The tree is empty. In which case we create a new node and set it as the root

Case 2 — The tree already has nodes. This is analogous to insert in a sorted array using binary search. We need to find the largest node smaller than the new node without a right child or the smallest node greater than the new node without a left child. Once we find such a node we assign the new node to be the empty side child. You can notice that the implementation is very similar to search.

Inserting is searching and assigning child. Here we insert 8 into the tree
def insert(self, val):
new_node = TreeNode(val)
# if tree is empty make new node the root
if self.root is None:
self.root = new_node
return

# else traverse appropriately till you reach None
curr_node = self.root
while True:
if curr_node.value > val:
if curr_node.left is None:
curr_node.left = new_node
break
curr_node = curr_node.left
else:
if curr_node.right is None:
curr_node.right = new_node
break
curr_node = curr_node.right

Delete

Deleting a node seems to be quit involved cause we have to maintain the tree structure after the deleted node leaves a hole. There are also 2 ways to implement this API, you can either accept the node to delete or the value to delete. If it’s the value, then you invoke search first and then delete it.

Case 1 — the deleted node is a leaf node. Here, we just have to set the left/right child of its parent to None to sever the connection.

Case 2 — the deleted node has only one child. Here, we directly connect the deleted node’s parent with this only child, severing both connections with the deleted node.

Case 3 — the deleted has both children. We now have two subtrees which somehow have to be made into a single tree so that it can be attached to the deleted node’s parent. This can be achieved by either upgrading the largest of the left subtree or the smallest of the right subtree to a common parent of both subtrees. Because for this node all the elements in the left subtree will be smaller than itself and all the elements in the right subtree will be larger than itself.

The different cases for delete — case 1, 2, 3a, 3b

When it comes to implementation, all of these cases can be seen as replacing the node to be deleted with some other node, which are

Case 1 — replace with None

Case 2 — replace with only child

Case 3 — replace with newly constructed tree

Small catches

  • the case where left or right subtree is just a single node (depending on the choice of which node you’re making the parent) the implementation differs a bit.
  • During implementation is when you’re trying to delete the root node things get weird because you don’t have a parent node. In my first implementation I handled this separately. But Wikipedia had a clean way of dealing with this and that’s the implementation I’ve gone with finally.
  • We also need the parent of the node to be deleted. For this we can either implement a get_parent() function or have a parent member in TreeNode class and assign it during insertion. This is a choice to be taken based on how often you expect delete to happen.
def _replace(self, node, replacement):
parent_node = self._get_parent(node)
if parent_node is None:
self.root = replacement
elif parent_node.left == node:
replacement.left = node.left
parent_node.left = replacement
else:
replacement.right = node.right
parent_node.right = replacement
del node

def delete(self, val):

node = self.search(val)
if node is None: return

# if node is leaf
if node.is_leaf():
self._replace(node, None)
# if node has only one child
elif node.left is None:
self._replace(node, node.right)
elif node.right is None:
self._replace(node, node.left)
# if node has both children
else:
# find the smallest node in the right subtree
smallest = BST._smallest(node.right)
parent_of_smallest = self._get_parent(smallest)
if parent_of_smallest == node:
self._replace(node, node.right)
else:
# release the smallest
parent_of_smallest.left = None
# make the smallest the root of the right subtree
smallest.right = node.right
self._replace(node, smallest)

Thanks for reading and let me know if you find any bugs.

--

--

Avinash

Interested in ML & Software. Prev at Inito, ShareChat. Ola. Graduated from IIT Madras.