Pythian Blog: Technical Track

Scala TreeSet: mutable vs immutable

I was exploring immutable RedBlackTree tree implementation in Scala (2.12.4) when I noticed something that wasn't clear to me:

private[this] def upd[A, B, B1 >: B](tree: Tree[A, B], k: A, v: B1, overwrite: Boolean)
                                    (implicit ordering: Ordering[A]): Tree[A, B1] =
  if (tree eq null) {
    RedTree(k, v, null, null)
  } else {
    val cmp = ordering.compare(k, tree.key)

    if (cmp < 0) {
      balanceLeft(isBlackTree(tree), tree.key, tree.value, upd(tree.left, k, v, overwrite), tree.right)
    } else if (cmp > 0) {
      balanceRight(isBlackTree(tree), tree.key, tree.value, tree.left, upd(tree.right, k, v, overwrite))
    } else if (overwrite || k != tree.key) {
      mkTree(isBlackTree(tree), k, v, tree.left, tree.right)
    } else {
      tree
    }
  }

Comparing keys via compare is followed by an explicit key, not " equals" condition. I compared similar parts of mutable RedBlackTree implementation and haven't found anything similar. So I started to look for the collections that may work differently for the mutable and immutable version.

So TreeSet, overwrite is passed as false compared to true TreeMap: this makes sense; for a tree map, inserting the same key will replace the value, while this is not needed for TreeSet. Although key equality is still checked, it's still not clear why. Simple tests show that mutable and immutable TreeSet work really differently in the following example.

Let's define class-extending Ordered with two fields that compare distinguishes only the first one:

case class Dummy(x: Int, y: Int) extends Ordered[Dummy] {
  override def compare(that: Dummy): Int = Ordering[Int].compare(this.x, that.x)
}

Now try to insert the different one by " equals" instances into mutable and immutable versions of sets:

val mutableTree = scala.collection.mutable.TreeSet.empty[Dummy]
mutableTree.add(Dummy(0, 0))
mutableTree.add(Dummy(0, 1))
mutableTree

var immutableTree = scala.collection.immutable.TreeSet.empty[Dummy]
immutableTree += Dummy(0, 0)
immutableTree += Dummy(0, 1)
immutableTree

So were the results (2.12.4):

Welcome to Scala 2.12.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121). 
Type in expressions for evaluation. Or try :help.

scala> case class Dummy(x: Int, y: Int) extends Ordered[Dummy] {
     |   override def compare(that: Dummy): Int = Ordering[Int].compare(this.x, that.x)
     | }
defined class Dummy

scala> val mutableTree = scala.collection.mutable.TreeSet.empty[Dummy]
mutableTree: scala.collection.mutable.TreeSet[Dummy] = TreeSet()

scala> mutableTree.add(Dummy(0, 0))
res0: Boolean = true

scala> mutableTree.add(Dummy(0, 1))
res1: Boolean = false

scala> mutableTree
res2: scala.collection.mutable.TreeSet[Dummy] = TreeSet(Dummy(0,0))

scala> var immutableTree = scala.collection.immutable.TreeSet.empty[Dummy]
immutableTree: scala.collection.immutable.TreeSet[Dummy] = TreeSet()

scala> immutableTree += Dummy(0, 0)

scala> immutableTree += Dummy(0, 1)

scala> immutableTree
res5: scala.collection.immutable.TreeSet[Dummy] = TreeSet(Dummy(0,1))

As we can see, mutable TreeSet has not got Dummy(0, 1) while the immutable TreeSet did. More interesting is what will happen if we try to remove an element:

scala> mutableTree.remove(Dummy(0, 1)) res6: Boolean = true scala> mutableTree res7: scala.collection.mutable.TreeSet[Dummy] = TreeSet() 

The mutable version worked exactly based on comparison. For the Immutable version:

scala> immutableTree -= Dummy(0, 0) scala> immutableTree res9: scala.collection.immutable.TreeSet[Dummy] = TreeSet() 

Nice, immutable did the same (it has Dummy(0, 1) and removing Dummy(0, 0) works).

If we look at the immutable RedBlackTree implementation of the del function, we can see that there is a simple:

val cmp = ordering.compare(k, tree.key) if (cmp < 0) delLeft else if (cmp > 0) delRight else append(tree.left, tree.right)

and there is no specific case for cmp == 0 && k != tree.key. This is not a bug as this violates Ordered contract

It is important that the equals method for an instance of Ordered[A] be consistent with the compare method. However, due to limitations inherent in the type erasure semantics, there is no reasonable way to provide a default implementation of equality for instances of Ordered[A]. Therefore, if you need to be able to use equality on an instance of Ordered[A] you must provide it yourself either when inheriting or instantiating.

I was still interested in finding out why this condition was added, so I went to git and found an original commit. It was exactly what I had expected for an overwrite parameter but there was no clear explanation as to why key compare was added:

Here, we had an issue that RedBlack does not work the same way for sets - which are not supposed to replace an element if it is the same (wrt equals) and maps - which should replace the corresponding values. Adding an overwrite parameter that decides whether to overwrite added keys if they are the same in the ordering.

Also worth noting is that HashSet has nothing to do with ordering so they end up having both values inserted:

scala> var immutableHash = scala.collection.immutable.HashSet.empty[Dummy] immutableHash: scala.collection.immutable.HashSet[Dummy] = Set() scala> immutableHash += Dummy(0, 0) scala> immutableHash += Dummy(0, 1) scala> immutableHash res11: scala.collection.immutable.HashSet[Dummy] = Set(Dummy(0,0), Dummy(0,1)) 

No Comments Yet

Let us know what you think

Subscribe by email