-
-
Save matthieubulte/e58dc1a6add5e114de0328f57dd3f460 to your computer and use it in GitHub Desktop.
| (* Good morning everyone, I'm currently learning ocaml for one of my CS class and needed to implement | |
| an avl tree using ocaml. I thought that it would be interesting to go a step further and try | |
| to verify the balance property of the avl tree using the type system. Here's the resulting code | |
| annotated for people new to the ideas of type level programming :) | |
| *) | |
| (* the property we are going to try to verify is that at each node of our tree, the height difference between | |
| the left and the right sub-trees is at most of 1. *) | |
| (* let's start by implementing the stuff we need for our invariance checking, namely natural numbers | |
| at the type level. They will be useful to help us track the height of a tree at the type level | |
| letting us then use this height to construct new trees with the right height balance. *) | |
| (* what are natural numbers? *) | |
| (* well, zero is a natural number for sure! *) | |
| type zero = Z;; | |
| (* good, we have one natural number, another infinity of them and we should be good. The nice thing about nats is that | |
| they can be constructed and defined in a recursive way. We have the zero case, now we can reach any other number | |
| by repeatedly adding one to zero, for instance 3 is 1+1+1+0. Let's try to create the type for that. *) | |
| type 'n succ = S of 'n;; | |
| (* so now you're going to tell me that this is wrong... you're right, one can construct the type unit succ and this | |
| wouldn't be a natural number anymore. But in practice, the user will never have to be touching this nat type | |
| so the damages are limited there and we get really awesome stuff be using this. *) | |
| (* Let's see how we can use translate our natural numbers into these types with some examples *) | |
| type one = zero succ;; | |
| type two = one succ;; | |
| (* That's a function, at the type level. Let it sink. *) | |
| type 'n plus_two = 'n succ succ;; | |
| type three = one plus_two;; | |
| (* Nice, now let's try to define the avl type. Remember that we don't want anyone to be able to create an invalid avl tree, | |
| that is, we don't want anyone to be able to create a tree where one of both children is deeper than the other children by | |
| two layers. | |
| In order to enforce this property, we'll need to be creative in the definition of our type! *) | |
| (* Here, the type parameter 'n encodes the height of our tree in order the be able to enforce the tree | |
| balance invariant using smart constructors *) | |
| type 'n avl = | |
| (* The most basic tree is an empty tree, called a Leaf, as it doesn't contain any key or children it | |
| has a 0 height *) | |
| | Leaf: zero avl | |
| (* Another valid form of avl tree is if both of the children have the same height. | |
| Note the type of the children, they are both parametrized by the same type n, meaning they | |
| should both have the same height! | |
| Note also the type of the constructed tree, it has a height of ('n succ) which is one more | |
| than the height of each of the children because we're adding a layer to the previous trees. *) | |
| | Balanced: 'n avl * int * 'n avl -> ('n succ) avl | |
| (* The only other valid shapes are trees where one of the children is one layer deeper than the other tree. We | |
| enforce this property by matching n with the height of the smallest subtree and forcing the height of the largest | |
| subtree to be exactly n+1. | |
| The resulting tree, because it adds one layer to the subtrees has a height of 1 + largers subtree, which means | |
| it has a height of n+2 | |
| *) | |
| | LeftHeavy: ('n succ) avl * int * 'n avl -> ('n succ succ) avl | |
| | RightHeavy: 'n avl * int * ('n succ) avl -> ('n succ succ) avl | |
| ;; | |
| (* Let's try this out by implementing the singleton function. This function | |
| creates a tree with a single element in it. As you can read in the type | |
| signature, the created tree has a height of 1. The nice thing here is that one | |
| wouldn't be able the get the height of the output wrong because the type system | |
| checks that the height in our type signature corresponds to the height of the constructed | |
| tree. | |
| This means the tree returned by the singleton function can't be annotated as a two tree, so | |
| let t : two tree = singleton 1 | |
| would be rejected by the type checker. *) | |
| let singleton : int -> one avl = fun x -> Balanced (Leaf, x, Leaf);; | |
| (* Next step is going to be a bit more interesting, we are going to start implementing the | |
| building blocks for our insert function by implementing the single_rotate_left that | |
| rebalances the tree as following: | |
| (p) (q) | |
| / \ / \ | |
| A (q) (p) C | |
| / \ / \ / \ | |
| B C A B ----- | |
| / \ | |
| ----- | |
| As you can see in the diagram, the right subtree has a height of n+2 while the left subtree has | |
| a height one n, which can be the case when inserting an element in the right part of the tree. But now | |
| we can't reconstruct the tree at the level of (p) because the height difference between it's children is | |
| of two. That's why we perform what is called a rotation in order to be able to construct a tree with | |
| the height and ordering invariances respected. | |
| In this implementation we only check the height invariance. | |
| *) | |
| let rotate_single_left : type n. n avl -> int -> (n succ succ) avl -> (n succ succ) avl = | |
| fun a p -> function | |
| | RightHeavy (b, q, c) -> Balanced (Balanced (a, q, b), q, c) | |
| | _ -> failwith "we'll take care of it later in the implementation of insert ;)" | |
| ;; | |
| (* Well, that's pretty! Don't worry that our functon is not total, I just wanted to demonstrate | |
| the case covered by our little diagram. | |
| The important part here is the type signature. As you can see this function takes a deconstructed | |
| non balanced tree, that is, a tree with one side two level longer than the other side, and creates | |
| a well formed avl tree of a certain hight. | |
| The cool thing here is that the compiler can actually verify that out implementation creates a tree | |
| of height n+2 provided the right arguments. So the compiler knows from the input parameters that the | |
| output we're constructing has the right height properties. Our implementation could not have been wrong | |
| because the compiler would have complained about it otherwise. | |
| Again, let it sink aind think about how cool that is. | |
| *) | |
| (* Before we go and implement the insert function, we will need to first create some little types | |
| and functions that will help us and the compiler prove that our implementation is correct. *) | |
| (* Because the tree might or might not grow in height after insering a value in | |
| it, we need to create a new type that will allow us to return the two | |
| different heights the tree can have after inserting a value in it. *) | |
| type 'n avl_insert = | |
| | Grew of ('n succ) avl | |
| | Same of 'n avl | |
| ;; | |
| (* Trichotomy of integers order. This type will let the totality checker verify in our insert | |
| implementation that we have covered all the comparison cases we can have when comparing the | |
| element we want to insert in the tree to the current node. | |
| It sounds like a small detail, but it's another great example of properly using the type system | |
| to model out domain enabling more expressiveness which is better for us to express ourselves and | |
| for the compiler to check what we're doing. | |
| *) | |
| type ordering = Lt | Eq | Gt;; | |
| let compare l r = if l == r | |
| then Eq | |
| else if l < r | |
| then Lt | |
| else Gt | |
| ;; | |
| (* Here we are. The implementation of the insert function. It's rather large and some might even | |
| argue that this is ugly. But all of this doesn't matter, what matters is that this implementation | |
| is verifiably correct. That there is no input that could make our function fail at runtime and | |
| that regardless of the input, the output will be a well balanced avl tree. No unit test, no property test | |
| will ever give you this level of confidence. Here the our implementation proves it's invariance and | |
| the type checker verifies our proof. | |
| You can just skim through the implementation, it's not important. The idea is the important part. | |
| Ideas are what's important. Take a moment to think about it :) | |
| *) | |
| let rec insert : type n. int -> n avl -> n avl_insert = fun x t -> match t with | |
| | Leaf -> Grew (singleton x) | |
| | Balanced (left, key, right) -> | |
| (match compare x key with | |
| | Eq -> Same t | |
| | Lt -> (match insert x left with | |
| | Grew new_left -> Grew (LeftHeavy (new_left, key, right)) | |
| | Same new_left -> Same (Balanced (new_left, key, right)) | |
| ) | |
| | Gt -> (match insert x right with | |
| | Grew new_right -> Grew (RightHeavy (left, key, new_right)) | |
| | Same new_right -> Same (Balanced (left, key, new_right)) | |
| ) | |
| ) | |
| | LeftHeavy (left, key, right) -> | |
| (match compare x key with | |
| | Eq -> Same t | |
| | Lt -> (match insert x left with | |
| | Same new_left -> Same (LeftHeavy (new_left, key, right)) | |
| (* This can't happen because the only way to have a growing | |
| balanced tree is to have appended a singleton. But this only | |
| happens if the left tree was a Leaf, so the current tree | |
| could not have been LeftHeavy if its left child was a Leaf. | |
| TODO: check if the totality checker can receive a hint on that | |
| *) | |
| | Grew (Balanced _) -> failwith "unreachable pattern not detected" | |
| (* Single rotation *) | |
| | Grew (LeftHeavy (left_left, left_key, left_right)) -> | |
| Same (Balanced (left_left, | |
| left_key, | |
| Balanced (left_right, key, right))) | |
| (* Double Rotation *) | |
| | Grew (RightHeavy (left_left, left_key, left_right)) -> | |
| (match left_right with | |
| | LeftHeavy (left_right_left, left_right_key, left_right_right) | |
| -> Same (Balanced ( | |
| Balanced (left_left, key, left_right_left), | |
| left_key, | |
| RightHeavy (left_right_right, left_right_key, right) | |
| )) | |
| | RightHeavy (left_right_left, left_right_key, left_right_right) | |
| -> Same (Balanced ( | |
| LeftHeavy (left_left, key, left_right_left), | |
| left_key, | |
| Balanced (left_right_right, left_right_key, right) | |
| )) | |
| | Balanced (left_right_left, left_right_key, left_right_right) | |
| -> Same (Balanced ( | |
| Balanced (left_left, key, left_right_left), | |
| left_key, | |
| Balanced (left_right_right, left_right_key, right) | |
| )) | |
| ) | |
| ) | |
| (* Simple Insert *) | |
| | Gt -> (match insert x right with | |
| | Grew new_right -> Same (Balanced (left, key, new_right)) | |
| | Same new_right -> Same (LeftHeavy (left, key, new_right)) | |
| ) | |
| ) | |
| | RightHeavy (left, key, right) -> | |
| (match compare x key with | |
| | Eq -> Same t | |
| (* Simple Insert *) | |
| | Lt -> (match insert key left with | |
| | Grew new_left -> Same (Balanced (new_left, key, right)) | |
| | Same new_left -> Same (RightHeavy (new_left, key, right)) | |
| ) | |
| | Gt -> (match insert x right with | |
| | Same new_right -> Same (RightHeavy (left, key, new_right)) | |
| (* This can't happen because the only way to have a growing | |
| balanced tree is to have appended a singleton. But this only | |
| happens if the right tree was a Leaf, so the current tree | |
| could not have been RightHeavy if its right child was a tree. | |
| TODO: check if the totality checker can receive a hint on that | |
| *) | |
| | Grew (Balanced _) -> failwith "unreachable pattern not detected" | |
| (* Double Rotation *) | |
| | Grew (LeftHeavy (right_left, right_key, right_right)) -> | |
| (match right_left with | |
| | Balanced (right_left_left, right_left_key, right_left_right) -> | |
| Same ( Balanced ( | |
| Balanced (left, key, right_left_left), | |
| right_left_key, | |
| Balanced (right_left_right, right_key, right_right) | |
| )) | |
| | LeftHeavy (right_left_left, right_left_key, right_left_right) -> | |
| Same ( Balanced ( | |
| Balanced (left, key, right_left_left), | |
| right_left_key, | |
| RightHeavy (right_left_right, right_key, right_right) | |
| )) | |
| | RightHeavy (right_left_left, right_left_key, right_left_right) -> | |
| Same ( Balanced ( | |
| LeftHeavy (left, key, right_left_left), | |
| right_left_key, | |
| Balanced (right_left_right, right_key, right_right) | |
| )) | |
| ) | |
| (* Simple Rotation *) | |
| | Grew (RightHeavy (right_left, right_key, right_right)) -> | |
| Same (Balanced (Balanced (left, key, right_left), | |
| right_key, | |
| right_right)) | |
| ) | |
| ) | |
| ;; | |
| (* That's all folk! We've shown that it's possible to leverage to verify more advanced properties about | |
| our program. In this example one might also want to check that the elements in the left part | |
| of the tree are indeed smaller than the current key, and that the right part of the tree larger | |
| keys have than the current key. | |
| The ideas are the same, it would just be a little bit more work in a language like ocaml | |
| because you would need to encode integers at the type level and keep them in sync with the value | |
| level integers you use as keys. Again, same idea, but more grunt work. | |
| Here are some keyword that will be interesting if you want to learn more about all of this: | |
| Dependent typing, Coq, Idris, Curry Howard Correspondence | |
| *) |
merci 👍
There's a typo 'trype' where you meant type.
@philmiller-charmworks thanks, it's fixed!
Really great read @matthieubulte! Thanks for sharing!
I've posted it here with Reason syntax if anyone prefers that or wants to play around with the code (also support OCaml):
https://sketch.sh/s/C7Afz38jtFRMTkkydeBgUA/
Wow, this is very cool @cem2ran !
A very eye opening piece of code, at least for us pedestrian programmers.
I am new to Ocaml (and functional generally) so .. this may be a dumb question
I have not been able to find any references to using constructors such as in the definition of type 'n avl (starting line 47). Every thing I find is | C of .....
On the other hand maybe I just dont recognize what I am seeing
Regardless - Could you give me a reference ?
Thanks
Hi @robertblackwell, I'm glad to hear that this piece of code was interesting to you. Concerning your question, I think the section on parametrized types in this article should answer all of your questions: https://www.cs.cornell.edu/courses/cs3110/2008fa/lectures/lec04.html
Good luck, and happy Ocamling!
Here's a tutorial/walkthrough/just -> walkthrough just
that at each node -> for each
to help use -> us
for sure -> of course (je vais pas te faire un cours de grammaire anglaise parce que j'en suis incapable mais dans ma tete of course sonne mieux)
But! -> Nan c'est que en Français
translate out natural -> our
of our tree in order the be able through smart constructors -> je suis passé au travers
Trop de texte pour qqn qui aime aussi peu l'informatique théorique que moi 😉 🌙 🌃