Trees
The programmer's primary weapon in the never-ending battle against slow systems is to change the intra-modular structure. Our first response should be to reorganize the modules' data structures.- Fred Brooks
Well designed trees follow the monad pattern. There is a bunch of data organized in some way, a series of monadal functions that operate on that bunch, and meta-data to be tracked, added, and removed. Each of the rules for monad is met. That's why you'll hear, "A tree is a monad" used colloquially. The more correct language would be, "Well made trees and the algorithms that operate on them follow the monad pattern." That's a little long. Maybe we should use "The tree monad" meaning the monadal type and the monadal functions. If we do this, it makes it easer to talk about. The we have to remember what we actually mean. Of course, this means we'll need to explain what we mean to everyone else.
For this course the monadal functions we'll focus on are \(add\_helper\) and \(contains\). A fully fledged set of monadal functions would also include other behaviors.
All types of trees consist of nodes. These nodes have a value and one or more \(next\) indicators. The lists you learned about in week 02 are a subset of trees where there is only one \(next\) indicator. These types of trees are called a unary trees. Binary trees have two \(next\)s, these are usually referred to as next-left and next-right as you see in Figure 1.
The order with which you code the left and right indicators is irrelevant. The indicators just need to be used consistently since there is no actual \(le\f tness\) or \(rightness\) to the indicators. This \(le\f tness\) or \(rightness\) is a relic of the visualizations of trees like you see below and that you may have seen in your previous coursework. In Figure 2, you see arrows representing the relationships between nodes. They go from from what is often called the parent node to the child node. The parent-child relationship is actually a form of the 'contains' relationship.
To say this in another way, all child nodes are sub components of their parent node. Using tuples, the diagram above would be described explicitly by the parent-child contains relationship below with \(nil\) meaning the parent has no child node. If both child nodes are nil, the node is a leaf-type node.
\( \{10,\{6,\{2,nil,nil\},\{7,nil,nil\}\},\{15,\{12,nil,nil\},nil\}\} \)Rather than draw out all the left and right next boxes, it is customary to use a circle to represent the node, put the value in or beside the circle, and then use arrows to indicate the child nodes.
Binary Search Trees (BST), a special type of binary tree, have a simple ordering rule used for adding values, removing values, and checking to see if values are in the tree. The rule is that lesser values go 'left' and greater or equal values go 'right.' This rule must be followed every time you add a value. If you follow this rule, when you need to search the tree or add a value to the tree, it can be done more simply.
An interesting result of this rule is that a tree can, but not always, have a different structure depending on the order in which the values are added. For example, if the values 10, 6, 15, 12, 7, and 2 were added to a binary tree in that order, the tree in Figure 3 would be generated. If, however, the same values were added in the order 12, 15, 10, 6, 2, and 7 the tree in figure 4 would be generated.
In mathematics there is, for example, only one number \(2\). That's why in Mathematics \(8\div 4 =2\) means that \(8\div 4\) is \(2\), not 'is the same as \(2\).' Two sets are not one set even if they contain the same elements. That's why two sets containing the same elements are considered equivalent rather than equal in mathematics.
Following this same logic, two trees can not be equal. They can, however, be equivalent. Consider the trees in Figures 3 and 4. They contain the same values even though their structures are different. If you were to think of them as mathematical sets, the two trees would be equivalent.
Because BSTs can so easily be thought of as sets, BST's can be used to represent mathematical sets. All that is needed for a BST to act as a set is to check if a value is already in the tree before adding it to the tree. If it is found that the value is already there, then don't add it. Now the BST is acting as a set.
Let \(n\in\{-1,0,1\}\) with \(0\) meaning equality, \(-1\) meaning less than, and \(1\) meaning greater than, and \(comp\) being a comparator algorithm comparing two elements of the tree, then the \(add\_helper\) algorithm for BSTs is: \[\begin{align*} -spec\ add\ &::\ node\ a\ (comp\ a\ b\rightarrow n)\rightarrow root\_node\\ \\ add\ &::\ nil\ a\ (comp\ a\ b\rightarrow n)\rightarrow\\ &\ \{a,nil,nil\};\\ add\ &::\ \{value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\when comp\ ::\ a\ value = -1\rightarrow\\ &\{value,(add\ nextL\ a\ comp),nextR\};\\ add\ &::\ \{value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\otherwise\rightarrow\\ &\{value,nextL,(add\ nextR\ a\ comp)\}. \end{align*}\]
The \(contains\) algorithm for BSTs is:
\[\begin{align*} -spec\ contains\ &::\ node\ a\ (comp\ a\ b\rightarrow n)\rightarrow Boolean\\ \\ contains\ &::\ nil\ a\ (comp\ a\ b\rightarrow n)\rightarrow\\ &\ false;\\ contains\ &::\ \{value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\when comp\ a\ value = 0\rightarrow\\ &true;\\ contains\ &::\ \{value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\when comp\ a\ value = -1\rightarrow\\ &contains\ nextL\ a\ comp;\\ contains\ &::\ \{value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\otherwise\rightarrow\\ &contains\ nextR\ a\ comp. \end{align*}\]There is a problem that can arise when using BSTs. They can get into a situation where they get some long branches and other short branches as in Figure 4. This type of situation means that searches can end up taking a lot longer for some values as compared to others. For example, if you were to search for \(7\) in Figure 4 it would require four comparisons to find it, but it would only take two comparisons to find \(15\). In this small tree, the speed difference would be negligible but imagine if the left branch of the tree contained 100,000 numbers and most of those were essentially in a list. Now the search for any element is essentially \(\mathcal{O}(n)\) instead of \(\mathcal{O} (\log n)\) as it is for a better balanced tree.
To help overcome this problem, the BST's node is augmented with an additional piece of data, a color. The two possible colors are red and black. The \(contains\) algorithm for a red-black tree is nearly identical to that for a BST. The only difference is that when the elements of the node are accessed, the node's color is ignored.
\[\begin{align*} -spec\ contains\ &::\ node\ a\ (comp\ ::\ a\ b\rightarrow n)\rightarrow Boolean\\ \\ contains\ &::\ nil\ a\ (comp\ ::\ a\ b\rightarrow n)\rightarrow\\ &\ false;\\ contains\ &::\ \{color,value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\when comp\ a\ value = 0\rightarrow\\ &true;\\ contains\ &::\ \{color,value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\when comp\ a\ value = -1\rightarrow\\ &contains\ nextL\ a\ comp;\\ contains\ &::\ \{color,value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\otherwise\rightarrow\\ &contains\ nextR\ a\ comp. \end{align*}\]In order to maintain the rules for red-black trees and support the balancing algorithms that are described below, the root of the tree will need to be forced, or coerced, into having its color be black. To ensure this, a facade add algorithm and a helper for it are used. The facade sets the tree's root to always have black as its color.
\[\begin{align*} -spec\ add\ &::\ node\ a\ (comparitor\ a\ b\rightarrow n)\rightarrow root\_node\\ \\ add\ &::\ nil\ a\ \_\rightarrow\\ &\{black,a,nil,nil\};\\ add\ &::\ root\ a\ comparitor\rightarrow\\ &add\_helper\ root\ a\ comparitor\looparrowright \{\_,next\_l,next\_r\}\\ &\{black,next\_l,next\_r\}. \end{align*}\]The color is also ignored for the first part of the red-black tree \(add\_helper\) algorithm. That means that until after the node is added to the tree, the \(add\_helper\) algorithm for red-black trees has the same behavior and rules as the BST. Then the tree may need to be balanced. That's why balancing algorithms are applied after \(add\_helper\) completes at each level of recursion. The \(add\_helper\) algorithm for red-black trees is: \[\begin{align*} -spec\ add\_helper\ &::\ node\ a\ (comp\ a\ b\rightarrow n)\rightarrow root\_node\\ \\ add\_helper\ &::\ nil\ a\ (comp\ a\ b\rightarrow n)\rightarrow\\ &\ \{red,a,nil,nil\};\\ add\_helper\ &::\ \{color,value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\when comp\ a\ value = -1\rightarrow\\ &balance\_l\ \{color,value,(add\_helper\ nextL\ a\ comp),nextR\};\\ add\_helper\ &::\ \{color,value,nextL,nextR\}\ a\ (comp\ a\ b\rightarrow n)\otherwise\\ &balance\_r\ \{color,value,nextL,(add\_helper\ nextR\ a\ comp)\}. \end{align*}\]
Just because the node is added to the tree doesn't mean we're done. The tree may be out of balance so red-black trees have two additional rules that are applied after the addition. The first says, "No red child node can have a red parent node." The second says, "Every path from the root to an empty node must have the same number of black nodes." If either of these rules is violated, a re-balance is needed. You will find the left and right balance algorithms below. They are written in function-matching style as described by Yamamoto. Yamamoto follows Okasaki in his node element order, \(\{color, nextLeft,value,nextRight\}\) as opposed to the order used here which is \(\{color,value,nextLeft,nextRight\}\).
\[\begin{align*} -spec\ balance\_l\ &::\ \{color,value,nextL,nextR\}\rightarrow root\_node\\ \\ &\textbf{%%black node has left red child with left red grandchild}\\ balance\_l\ &::\ \{black,value,\{red,aValue,\{red,bValue,bLeft,bRight\},aRight\},nextR\}\rightarrow\\ &\{red,aValue,\{black,bValue,bLeft,bRight\},\{black,value,aRight,nextR\}\};\\ &\textbf{%%black node has left red child with right red grandchild}\\ balance\_l\ &::\ \{black,value,\{red,aValue,aLeft,\{red,bValue,bLeft,bRight\}\},nextR\}\rightarrow\\ &\{red,bValue,\{black,aValue,aLeft,bLeft\},\{black,value,bRight,nextR\}\};\\ &\textbf{%%otherwise}\\ balance\_l\ &::\ \{color,nextL,nextR\}\rightarrow\\ &\{color, nextL,nextR\};\\ \end{align*}\]
\[\begin{align*} -spec\ balance\_r\ &::\ \{color,value,nextL,nextR\}\rightarrow root\_node\\ \\ &\textbf{%%black node has right red child with right red grandchild}\\ balance\_r\ &::\ \{black,value,nextL,\{red,aValue,aLeft,\{red,bValue,bLeft,bRight\}\}\}\rightarrow\\ &\{red,aValue,\{black,value,nextLeft,aLeft\},\{black,bValue,bLeft,bRight\}\};\\ &\textbf{%%black node has right red child with left red grandchild}\\ balance\_r\ &::\ \{black,value,nextL,\{red,aValue,\{red,bValue,bLeft,bRight\},aRight\}\}\rightarrow\\ &\{red,bValue,\{black,value,nextL,bLeft\},\{black,aValue,bRight,aRight\}\};\\ &\textbf{%%otherwise}\\ balance\_r\ &::\ \{color,nextL,nextR\}\rightarrow\\ &\{color, nextL,nextR\};\\ \end{align*}\]
When looking at visualizations of red-black trees, it is common for the visualization to separately apply needed color changes and then adjust the node structures. The balance algorithms above do both of these steps at the same time so don't let the visualization lead you to believe they should be done separately.
BST's are fairly simple to create in functional programming languages, but balancing trees is somewhat more difficult to grasp, not code, and can cause greater duplication of nodes when compared to BST's. Which tree to choose is situationally dependent.