String is never having to say synchronized February 8, 2011
and got a bunch of replies from people pointing me to java.util.Collections.unmodifiableList which is the worst of all possible worlds.No wonder Java multithreaded code is hard to write... no immutable collections in the Java std library.What are these people thinking?
— David Pollak (@dpp) February 8, 2011
Let's start with an immutable data structure in Java that we're all familiar with: String. When you have a reference to a String, you can call any method on that String without synchronizing it. You can pass the String to other threads and retain your original reference (no need to make a defensive copy.) Strings work so well in Java that many APIs take Strings as parameters and return Strings rather than more complex data structures. For example, most web frameworks are String-based. You emit Strings and hope that they result in valid HTML. This is compared with Lift which is DOM based and always has a well formed DOM that it transforms.
I've been doing Scala coding for almost 4 1/2 years and I've really come to appreciate Scala's "default to immutability." Basically, all the Scala collections are immutable. It's super easy to define immutable classes:
And create them:
And give them a birthday:
If I've got a reference to the object originally assigned to p, I can access the fields and do whatever else I want with that instance without worrying about synchronization or anyone else changing the instance out from under me.
Scala also has immutable collections so that I can create a List:
And I can access that reference from any thread without synchronizing. This is very powerful because I can pass "the state at a particular time" to another thread or keep it around on my current thread without worry. Without making a defensive copy.
There are no built in Java collections that allow me to do this. Java does have java.util.Collections.unmodifiableList. This is the worst of all possible worlds. Let me illustrate:
l2: java.util.concurrent.CopyOnWriteArrayList[String] = []
scala> l2.add("Hello")
res10: Boolean = true
scala> l2.add("Hello2")
res11: Boolean = true
scala> l2
res12: java.util.concurrent.CopyOnWriteArrayList[String] = [Hello, Hello2]
scala> val l3 = Collections.unmodifiableList(l2)
l3: java.util.List[String] = [Hello, Hello2]
scala> l2.add("THree")
res13: Boolean = true
scala> l2
res14: java.util.concurrent.CopyOnWriteArrayList[String] = [Hello, Hello2, THree]
scala> l3
res15: java.util.List[String] = [Hello, Hello2, THree]
The receiving method (in this case, it's just a variable l3), cannot mutate the collection, but there are no guarantees that someone else cannot mutate it. There is no way to insure that you have a copy that cannot be mutated... even if you make a defensive copy of the collection the moment you get it, there's no guarantee that some other thread isn't mutating it while you're making your copy. This makes multi-threaded code very difficult to write.
Let's see the different between Scala code with immutable collections:
/**
* A singleton that provides chat features to all clients.
* It's an Actor so it's thread-safe because only one
* message will be processed at once.
*/
object
ChatServer
extends
LiftActor
with
ListenerManager {
private
var
msgs
=
Vector(
"Welcome"
)
// private state
/**
* When we update the listeners, what message do we send?
* We send the msgs, which is an immutable data structure,
* so it can be shared with lots of threads without any
* danger or locking.
*/
def
createUpdate
=
msgs
/**
* process messages that are sent to the Actor. In
* this case, we're looking for Strings that are sent
* to the ChatServer. We append them to our Vector of
* messages, and then update all the listeners.
*/
override
def
lowPriority
=
{
case
s
:
String
=
> msgs
:
+
=
s; updateListeners()
}
}
And Java code that has mutable collections:
* The chat server
*/
public class ChatServer extends LiftActorJWithListenerManager {
// private state. The messages
private ArrayList<String> msgs = new ArrayList<String>(); // the private data
private ChatServer() {
msgs.add("Welcome");
}
@Receive
/**
* A String is sent as a message to this Actor. Process it
*/
private void gotAString(String s) {
// add to the messages. No need to synchronize because
// the method will only be invoked by the Actor thread and
// only one message will be processed at once
msgs.add(s);
// cap the length of our messages at 20 messages
if (msgs.size() > 20) {
msgs = new ArrayList(msgs.subList(msgs.size() - 20, msgs.size()));
}
// update the listeners
updateListeners();
}
// what we send to listeners on update
public Object createUpdate() {
// make a copy of the list and send it to the listeners as unmodifable
// Note, if we used Scala's immutable collections or even Functional
// Java's immutable List, we would not have to make a defensive copy
// of the messages
return Collections.unmodifiableList((List<String>) msgs.clone());
}
}
Note that when we send the "current state" message to our listeners, we have to clone the collection and mark it unmodifiable. This is slower than using Scala's immutable data structures.
When you think about code that is thread-safe, immutable data structures are the best mechanism for writing code. Clojure has the best immutable abstractions and a tremendous set of immutable data structures. Scala has great immutable data structures and reasonable library-based support for multi-threaded coding. Java has none of this. Further, every person that thinks java.util.Collections.unmodifiableList gives you immutability should not be writing multi-threaded code.