Java, a developer's island without trees
This is a blog-post about the Java programming language. It describes the lack of the abstract data type Tree in the standard Java library and the approach to fix it.
The downloads can be found at the end of this blog post.
A tree-less world
Perhaps you have experienced the same as i did. You have a task at hand and think about different ways to do it. At some time you have a great idea, and you just need something quite common - but the language you are writing in just does not have it. This happened to me as i wanted to split several paths into their elements and put it into a tree, so i can go across the paths as i need to. I had to implement it in Java, but Java has no trees.
I learned in university about the several different abstract data types: Lists, Sets, Queues, Stacks, Maps (or “associative arrays”) and last but not least: Trees. There are several more abstract data types, but these are the ones I know of and use. All but the Trees are implemented in the Java 1.4/5.0 library. There is a non-generic interface in javax.swing.tree.TreeNode, but it is only used in the JTree class (javax.swing.JTree), a graphical Representation of Tree structures.
So I googled for an implementation. I even found a nice blog-post about it, with a sample implementation: http://sujitpal.blogspot.com/2006/05/java-data-structure-generic-tree.html. It is a nice and robust implementation that does the task at hand nicely, but it had some flaws in my opinion:
-
Tree and Node don't implement an interface, but i want a real abstract data type
-
several methods (like getParent(), addChildren(T…), search(T)) are missing
Expanding the prototype
After finding this implementation I went with it and approached my self made task list:
1. Abstraction: I wrote an interface for the two classes (ITree and TreeNode), copied all of its method descriptions into the interfaces and made them implement the interface. So that was part one, which was not so difficult. At the end of this step i had 4 files:
-
Tree.java (implementing ITree.java, redirecting all TreeNode methods to its root node)
-
ITree.java (the Tree interface, extending TreeNode)
-
Node.java (implementing TreeNode.java)
-
TreeNode.java (the abstracted Node interface)
2. Adding missing methods: I remembered the things i've learned at university and tried to apply them to the TreeNode. While adding several methods to the interface and implementing them in the Node class I noticed that the class names were not compliant with the usual Java naming convention in java.util, which goes something like this:
-
Interface: the name of the abstract data type (like List, Set or Map)
-
Implementation: a hint at how it is implemented + the interface name (like ArrayList or LinkedList)
So the classnames had to change to comply to this standard naming convention. Since the whole implementation is based on Lists, I chose ListTree and ListTreeNode. As i went on, these names started to sound somewhat silly. Afterwards I added 2 more implementations that both require Maps and don't allow duplicate child nodes, that means, every child of these nodes has a different data than the other children. One of them merges duplicates, the other one plain out throws an Exception. So now I had 3 TreeNodes, each of them doing different things. They are named after what they do: DuplicatesNode, UniqueNode and MergeNode. After also adding an immutable implementation, this is what the package now looks like:
Final Class Structure
Interfaces
-
Tree.java (the abstract Tree interface, moved from ITree; now a separate interface)
-
TreeNode.java (the abstract TreeNode interface, now the heart of it all)
Abstract Classes
-
AbstractTree.java (abstract Tree implementation, delegating every call to the root node)
-
AbstractTreeNode.java (abstract TreeNode implementation, provides helper methods)
-
ForwardingTree.java (abstract Tree, forwards every call to another Tree)
-
ForwardingTreeNode.java (abstract TreeNode, forwards every call to another TreeNode)
Implementations
-
DefaultTree.java (the default Tree implementation, extends AbstractTree)
-
DuplicatesNode.java (previously Node, then ListNode, now DuplicatesNode, allowing duplicate children)
-
ImmutableNode.java (an immutable TreeNode, initialized at start, can be copied from another TreeNode)
-
ImmutableTree.java (a Tree that makes a deep copy of a tree and always stays immutable)
-
MergeNode.java (a TreeNode that has unique children and merges input to achieve this goal)
-
UniqueNode.java (a TreeNode that has unique children and throws Exceptions on illegal input)
Innovating the veggie-world
TreeNode
Note: Every TreeNode implementation in this package is not threadsafe. They must be synchronized externally.
The TreeNode interface without documentation looks like this now:
public interface TreeNode<E> {
TreeNode<E> getRoot();
TreeNode<E> getParent();
void setParent(TreeNode<E> parent);
boolean changeParent(TreeNode<E> newParent);
Collection<TreeNode<E>> getChildren();
void setChildren(Collection<TreeNode<E>> children);
List<TreeNode<E>> addChildren(E... children);
int getNumberOfChildren();
TreeNode<E> addChild(E childData);
void addChildNode(TreeNode<E> child);
boolean hasChild(E childData);
TreeNode<E> getChildAt(int index) throws IndexOutOfBoundsException;
void insertChildAt(int index, TreeNode<E> child) throws IndexOutOfBoundsException;
void removeChildNode(TreeNode<E> child);
void removeChildAt(int index) throws IndexOutOfBoundsException;
void removeAllChildren();
E getData();
void setData(E data) throws PropertyVetoException;
boolean isLeaf();
boolean contains(TreeNode<E> descendant);
void addVetoableChangeListener(VetoableChangeListener listener);
void removeVetoableChangeListener(VetoableChangeListener listener);
}
All Implementations have two constructors, one standard constructor without any arguments and one with an argument of type E. A TreeNode can be added as a child to another TreeNode with the “addChildNode” method. Alternatively, there is also the “addChild(E)” method that adds a child with the specified data and returns the created child. This is the easiest way to quickly build a complex tree as you can see in the example in the next section. “addChildren(E…)” adds several children at once, returning a list of the generated TreeNodes.
There is also “insertChildAt(int, TreeNode)” that inserts a TreeNode as a child at a specified index. The general contract is that the children have a certain order. It is not guaranteed that this order is preserved at all time, or that an insertChildAt really inserts the child at the given position. This is especially true for the MergeNode. But the general contract is that if it is possible to insert the child at the given position, then it is inserted there.
There are also the removeChildNode-, removeChildAt- and removeAllChildren- methods that remove either one node or all at once. If the given TreeNode is not present then nothing happens.
There are also getter and setter for parent, children and the data of the node. So there's getData() and setData(E), getParent() and setParent(TreeNode), and last but not least getChildren() and setChildren(…). This means that each TreeNode fulfills the bean standard and can be serialized. The TreeNode interface itself is not serializable, but the 3 mutable implementations are serializable. But please note that the Serialization is not tested, so it's highly experimental. It is also not optimized like other data structures.
This should cover most methods. There are also some helper methods like
-
isLeaf() - returns true if the node has no children, but a parent
-
hasChild(E) - returns true if the node has a child with the given data
-
getRoot() - returns the root node (a root node has no parent)
-
changeParent(TreeNode) - changes the parent of this node to a different parent, should be used instead of setParent(TreeNode)
-
contains(TreeNode) - returns true if this node contains the given node in its descendant hierarchy
-
getNumberOfChildren() - returns the number of direct children
-
addVetoableChangeListener(…) - adds a VetoableChangeListener, that can veto changing the data of a node;
-
removeVetoableChangeListener(…) - removes a previously added VetoableChangeListener;
In the jar file “cosmocode-tree-1.0-tests.jar” there is a “TreeNodeTest.java” that is an abstract junit-4.7-test that can be used to test any TreeNode implementation on the above mentioned specifications. At the moment it contains 63 tests, which should cover most use cases.
Tree
The Tree interface looks basically the same as the TreeNode interface, with some additional methods:
...
/**
* Set the root Element for the tree.
* @param rootElement the root element to set.
* @throws NullPointerException if rootElement is null
*/
void setRootElement(TreeNode<E> rootElement);
/**
* Returns an Iterable that traverses the {@code Tree<T>} in the given {@link TraverseMode}.
* Look at the JavaDoc of TraverseMode for further informations.
* @param mode the mode to traverse in.
* @return an Iterable that returns an iterator that traverses this Tree in the given TraverseMode
*
* @see TraverseMode
*/
Iterable<E> traverse(TraverseMode mode);
/**
* Returns true if this Tree is equal to the other,
* false otherwise.
* Two trees are commonly considered equal, if their root elements are equal,
* if not stated otherwise by the implementing class.
* @param t the other Tree
* @return true if this tree is equal to the other, false otherwise
*/
boolean equals(Tree<?> t);
Most implementations will delegate its calls to its root node. Only the 3 methods mentioned above must be implemented by a Tree. But the AbstractTree.java already provides this behaviour, except “setRootElement” and “getRoot()”.
Please note that currently no Tree implementation recognizes “TraverseMode.IN_ORDER”. All throw an IllegalArgumentException on IN_ORDER, because IMHO there is no practical use for this when each TreeNode has multiple children, so it is not defined where the TreeNode itself is put between the children. If a binary TreeNode is added later, too, then the corresponding Tree will most likely support IN_ORDER.
How do I use it?
import de.cosmocode.tree.*;
...
final Tree<String> tree = new DefaultTree<String>();
final TreeNode<String> fruits = tree.addChild("fruits");
fruits.addChild("apple").addChildren("seeds", "flesh", "skin");
fruits.addChild("banana").addChildren("peel", "flesh");
fruits.addChild("orange").addChildren("seeds", "flesh", "peel");
for (final String elem : tree.traverse(TraverseMode.PRE_ORDER)) {
System.out.print(elem + ", ");
}
System.out.println();
// output: null, fruits, apple, seeds, flesh, skin, banana, peel, flesh, orange, seeds, flesh, peel,
This is just a short demonstration of what you can do with a Tree.
Because of the modular structure this can also be used for data structures (like XML or JSON) or a representation of a file structure.
A better implementation
Well, there are better implementations, yet I did not find them on time. I did a short google search for “java tree”, “java generic tree” and such. I did not find anything useful on the first 2-3 pages besides the already mentioned blog post, so i gave up and took what i had there. At first I just wanted to take what I had there and build some abstraction around it, but during the progress several things turned up that just had to be done. So it grew on and on and on … and that's how it has become what it is now.
When I told some colleagues what i am doing (it was already too late to stop then), several alternatives showed up :
Some of these are better implementations than the presented, others have a different approach, yet others are just untested samples. Of course it depends on the project whether one of these is better suited or not. If you know any other good tree implementations then please add them in the comments.
What all of these have in common though is that they have no tests. At least i haven't found any document tests, where my implementation has them. But this should not be an indication about the robustness of the code. An untested code can be rock solid if the programmer is very experienced, and a tested code may have overlooked some possibility. But a tested code is generally more stable than an untested one. At least I can assure that my implementations work as long as they are used in a single-threaded environment.
I thank google for providing google collections and I thank my colleagues Willi Schönborn, Florian Purchess and Andreas Gohr for helping me on this blog post.