For All You Know, It's Just a Java Library September 24, 2014
Blast from the past...
I wrote this in May, 2008... and I've gotta say, I was pretty spot-on including Java 8 adopting some of Scala's better features:
It's starting to happen... the FUD around Scala. The dangers of Scala. The "operational risks" of using Scala.
It started back in November when I started doing lift and Scala projects for hire. The questions were reasonable and rational:
- If you build this project in Scala and lift, who else can maintain it? A: the lift community has 500+ (300+ back in November) in it and at least 3 (now 10) of the people on the lift list are actively seeking Scala and lift related gigs.
- What does lift give us? A: a much faster time to market with Web 2.0, collaborative applications than anything else out where.
- Why not use Rails, it's got great developer productivity? A: Rails is great from Person to Computer applications... when you're building a chat app or something that's person to person collaborative, lift's Comet support is much more effective than Rails... plus if you have to scale your app to hundreds of users simultaneously using your app, you can do it on a single box with lift because the JVM gives great operational benefits.
Then came the less rational questions:
- Why use new technology when we can outsource the coding to India and pay someone 5% of what we're paying you? A: Because I'm more than 20 times better than the coders you buy for 5% of my hourly rate plus there's a lot less management of my development efforts.
- Why not write it in Java to get the operational benefits of Java? A: Prototyping is hard. Until you know what you want, you need to be agile. Java is not agile. Ruby/Rails and Scala/lift are agile. Choose one of those to do prototyping and then port to Java if there's a reason to.
- Will Scala be incompatible with our existing Java code? A: No. Read my lips... No. The same guy who wrote the program (javac) that converts Java to JVM byte code wrote the program that converts Scala to JVM byte code. There's no one on this planet that could make a more compatible language than Martin Odersky.
Okay... flash forward a few months. Buy a Feature has gone through prototyping, revisions, and all the normal growth that software goes through. It works (yeah... it still needs more, but that's the nature of software versions.) There's another developer (he mainly write ECMAScript) who picked up the lift code and made a dozen modifications in the heart of the code with 2 hours of walk-through from me and 20-30 IM and email questions. He mainly does Flash and Air work and found Scala and lift to be pretty straight forward.
We also had an occasion to have 2,000 simultaneous (as in at the same time, pounding on their keyboards) users of Buy a Feature and we were able to, thanks to Jetty Continuations, service all 2,000 users with 2,000 open connections to our server and an average of 700 requests per second on a dual core opteron with a load average of around 0.24... try that with your Rails app.
One of the customers of Buy a Feature wanted it integrated into their larger, Java-powered web portal along with 2 other systems. I did the integration. The customer asks "Where's the Scala part?" I answer "It's in this JAR file." He goes "But, your program is written in Scala, but I looked at the byte-code and it's just Java." I answer "It's Scala... but it compiles down to Java byte-code and it runs in a Java debugger and you can't tell the difference." "You're right," he says.
So, to this customer's JVM, the Scala and lift code looks, smells and tastes just like Java code. If I renamed the scala-library.jar file to apache-closures.jar, nobody would know the difference... at all.
Okay... but each set of people I talk to, I hear a similar variation about the "operational risks" of using Scala.
Let's step back for a minute. There are development and team risks for using Scala.
Some Java programmers can't wrap their heads around the triple concepts of (1) type inference, (2) passing functions/higher order functions and (3) immutability as the default way of writing code. Most Ruby programmers that I've met don't have the above limitations. So, find a Ruby program who knows some Java libraries or find a Java programmer who's done some moonlighting with Rails or Python or JavaScript and you've got a developer who can pick up Scala in a week.
Yes, the tools in Scala-land are not as rich as the tools in Java-land. But, once again, anyone who can program Ruby can program in Scala. There's a fine Textmate bundle for Scala. I use jEdit. Steve Jenson uses emacs. Thanks to David Bernard's continuous compilation Maven plugin, you save your file and your code is compiled.
Oh... and there's the old Eclipse plugin which more or less works and has access to the Eclipse debugger and the new Eclipse plugin is reported to work quite well. And then there's the NetBeans plugin which is still raw, but getting better every week.
Even with the limitation of weak IDE support, head-to-head people can write Scala code 2 to 10 times faster than they can write Java code and maintaining Scala code is much easier because of Scala's strong type system and code conciseness.
But, getting back to our old friend "you can't tell it's not Java", I wrote a Scala program and compiled it with -g:vars (put all the symbols in the class file), started the program under jdb (the Java Debugger... a little more on this later) and set a breakpoint. This is what I got:
> Step completed: "thread=main", foo.ScalaDB$$anonfun$main$1.apply(), line=6 bci=0
> 6 args.zipWithIndex.foreach(v => println(v))
main[1] dump v
v = {
_2: instance of java.lang.Integer(id=463)
_1: "Hello"
}
main[1] where
[1] foo.ScalaDB$$anonfun$main$1.apply (ScalaDB.scala:6)
[2] foo.ScalaDB$$anonfun$main$1.apply (ScalaDB.scala:6)
[3] scala.Iterator$class.foreach (Iterator.scala:387)
[4] scala.runtime.BoxedArray$$anon$2.foreach (BoxedArray.scala:45)
[5] scala.Iterable$class.foreach (Iterable.scala:256)
[6] scala.runtime.BoxedArray.foreach (BoxedArray.scala:24)
[7] foo.ScalaDB$.main (ScalaDB.scala:6)
[8] foo.ScalaDB.main (null)
main[1] print v
v = "(Hello,0)"
main[1]
My code worked without any fancy footwork inside of the standard Java Debugger. The text of the line that I was on and the variables in the local scope were all there... just as if it was a Java program. The stack traces work the same way. The symbols work the same way. Everything works the same way. Scala code looks and smells and tastes to the JVM just like Java code... now, let's explore why.
A long time ago, when Java was Oak and it was being designed as a way to distribute untrusted code into set-top boxes (and later browsers), the rules defining how a program executed and what the means of the instruction set (byte codes) were was super important. Additionally, the semantics of the program had to be such that the Virtual Machine running the code could (1) verify that the code was well behaved and (2) that the source code and the object code had the same meaning. For example, the casting operation in Java compiles down to a byte code that checks that the class can actually be cast to the right thing and the verifier insures that there's no code path that could put an unchecked value into a variable. Put another way, there's no way to write verifiable byte code that can put a reference to a non-String into a variable that's defined as a String. It's not just at the compiler level, but at the actual Virtual Machine level that object typing is enforced.
In Java 1.0 days, there was nearly a 1:1 correspondence between Java language code and Java byte code. Put another way, there was only one thing you could write in Java byte code that you could not write in Java source code (it has to do with calling super in a constructor.) There was one source code file per class file.
Java 1.1 introduced inner classes which broke the 1:1 relationship between Java code and byte code. One of the things that inner classes introduced was access to private instance variables by the inner class. This was done without violating the JVM's enforcement of the privacy of private variables by creating accessor methods that were compiler enforced (but not JVM enforced) ways for the anonymous classes to access private variables. But the horse was out of the barn at this point anyway, because 1.1 brought us reflection and private was no longer private.
An interesting thing about the JVM. From 1.0 through 1.6, there has not been a new instruction added to the JVM. Wow. Think about it. Java came out when the 486 was around. How many instructions have been added to Intel machines since 1995? The Microsoft CLR has been around since 2000 and has gone through 3 revisions and new instructions have been added at every revision and source code compiled under an older revision does not work with newer revisions. On the other hand, I have Java 1.1 compiled code that works just fine under Java 1.6. Pretty amazing.
Even to this day, Java Generics are implemented using the same JVM byte-codes that were used in 1996. This is why you get the "type erasure" warnings. The compiler knows the type, but the JVM does not... so a List<String>
looks to the JVM like a List, even though the compiler will not let you pass a List<String>
to something that expects a List<URL>
. On the server side, where we trust the code, this is not an issue. If we were writing code for an untrusted world, we'd care a lot more about the semantics of the source code being enforced by the execution environment.
So, there have been no new JVM instructions since Java was released. The JVM is perhaps the best specified piece of software this side of ADA-based military projects. There are specs and slow-moving JSRs for everything. Turns out, this works to our benefit.
The JVM has a clearly defined interface to debugging. The information that a class file needs to provide to the JVM for line numbers, variable names, etc. is very clearly specified. Because the JVM has a limited instruction set and the type of each item on the stack and of each instance variable in a class is know and verified when the class loads, the debugging information works for anything that compiles down to Java byte code and has semantics of named local variables and named instance variables. Scala shares these semantics with Java and that's why the Scala compiler can compile byte-code that has the appropriate debugging information so that it "just works" with jdb. And, just to be clear, jdb uses the standard, well documented interface into the JVM to do debugging and every other IDE for the JVM uses this same interface. That means that an IDE that compiles Scala can also hook into the JVM and debug Scala. That's why debugging work with the Scala Eclipse plugin.
But, let's go back to the statement: nobody knows Scala's operational characteristics.
That's just not true. Scala's operational characteristics are the same as Java's. The Scala compiler generates byte code that is nearly identical to the Java compiler. In fact, that you can decompile Scala code and wind up with readable Java code, with the exception of certain constructor operations. To the JVM, Scala code and Java code are indistinguishable. The only difference is that there's a single extra library file to support Scala.
Now, in most software projects, you don't have CEOs and board members, and everybody's grandmother asking what libraries you're using. In fact, in every project I've stepped into, there have been at least 2 libraries that the senior developers did not add but somehow got introduced into the mix (I believe in library audits to make sure there's no license violations in the library mix.) So, in the normal course of business, libraries are added to projects all the time. Any moderately complex project depends on dozens of libraries. I can tell you to a 100% degree of certainty that there are libraries in that mix that will not pass the "is the company that supports them going to be around in 5 years?" test. Period. Sure, memcached will be around in 5 years and most of the memcached clients will. Slide on the other hand is "retired". And Mongrel...
Making the choice to use Scala should be a deliberate, deliberated, well reasoned choice. It has to do with developer productivity, both to build the initial product and to maintain the product through a 2-5 year lifecycle. It has to do with maintaining existing QA and operations infrastructure (for existing JVM shops) or moving to the most scalable, flexible, predictable, well tested, and well supported web infrastructure around: the JVM.
Recruiting team members who can do Scala may be a challenge. Standardizing on a development environment may be a challenge as the Scala IDE support is immature (but there's always emacs, vi, jEdit and Textmate which work just fine.) Standardizing on a coding style is a challenge. These are all people challenges and all localized to recruiting and development and management thereof. The only rational parts of the debate are the trade-off between recruiting and organizing the team and the benefits to be gained from Scala.
But, you say, what if Martin Odersky decides to take Scala in a wrong direction? Then freeze at Scala 2.7 or 2.8 or where-ever you feel the break is rational. It was only last year that Kaiser moved from Java 1.3 to 1.4. Working off of 2 or 3 year old technology is normal. Running against trunk-head is not the way of an organization that's asking the question "where will Martin take Scala in 5 years?" And oh, by the way, if Martin gets off track or Scala for some reason languishes, it's most likely to be the same scenario as GJ (Generics Java... Martin's prior project that turned into Java Generics)... it's because Java 8 or Java 9 has adopted enough of Scala's features to make Scala marginal. In that case, you spend a couple of months porting the Scala code to Java XX and in the process fix some bugs.
And not to put too fine a point on it, but Martin's team runs one of the best ISVs I've ever seen. They crank out a new, feature-packed release every six months or so. They respond, sometimes within hours, to bug reports. There is an active support mechanism with some of the best coders around waiting to answer questions from newbies and old hands alike. If we were to measure the Scala team on commercial standards, they've got a longer funding runway than any private software company around and they're more responsive than almost every ISV, public or private. So what if they're academic... maybe that means they're thinking through issues rather then being code-wage slaves.
Bottom line... to anyone other than the folks with hands in the code and the folks who have to recruit and manage them, "For all you know, it's just another Java library."