Eclipse, STOP Telling Java Newbies to Use serialVersionUID!

I have always been weary of IDEs and other programming tools that “do the thinking for you.” In the best of cases, they allow experienced coders to rise above the busy work and text pushing of adding getters and setters, refactoring and shuffling files around. They offer handy templates for starting projects and classes. They can help give uniform style and headers to files. They are extraordinary helpful. So I am not trying to push everyone back to emacs (if only I could), but I am going to tell a lot of people to stop hijacking the fast lane.

Problem is a lot of less experienced programmers use these tools as a crutch. The pick up the habits promoted by these tools and don’t really understand what is going on behind them. Then they exhibit the worst personality trait of developers, when questioned about a habit that they don’t really understand, they become fanatical to defend it.

Case in point, Eclipse puts a serialVersionUID in any serializable class. If the developer removes it, it gives them a warning. So every new job I find a whole nest of developers who are putting serialVersionUID all over the code.

What’s wrong with serialVersionUID, you ask? It’s a hack, plain and simple. It is usally deployed as a quick and dirty way to get around proper development. Worse about Eclipse is the fact that they are giant generated numbers.

I usually hear two excuses for adding a serialVersionUID to an object: performance and version control.

Performance Myth

The performance benefits of serialVersionUID are a myth. Long story short, yes, Java will calculate a serialVersionUID at runtime if one is not provided. This is a heavy calculation involving hashing of the bytecode. However, most software, especially server-side software, will be serializing the same objects over and over again. The JVM does not calculate a new serialVersionUID each time, just once, the first time. So if your server can stay up hours, days, or even weeks; your optimization is negligible.

It does not make it faster to send the object over the wire, it does not make the marshaling or unmarshaling any faster. There is not gain after that onetime hashing of the bytecode.

Sloppy Setup

Yes, you can use the serialVersionUID to give an object an explicit version. If you let the JVM do it for you, it is very exacting. Anything that changes the bytecode will cause a new serialVersionUID: changes to the source code, using different compilers, or even different compiler switches can all effect the bytecode. You can easily get into situations where you get class not compatible errors.

If you are in the server-side world, this screams sloppy work! Why are you developers all using different compilers and switches? Why are different jars deployed across the servers? The answer isn’t to fake it with serialVersionUID but to get you team all on a clean and standardized setup. Your bug list, QA team, and Sysadmins up at 3am fixing things will thank you for it.

Run Your Versions

If you are serious about controlling your servers, and it isn’t because you are too lazy to make sure that everyone is using the same setup; then go for it. But why does Eclipse give you that crappy:

serialVersionUID = 123871234768L;

Why not start at a sensible 1, or 100? When you come back a week later, are you going to notice that someone had to up the version to 123871234790L?

Also, off of the top of your head, what are all the changes to a class that may require a new serialVersionUID? The normal advice is, when in doubt, iterrate? I suspect that most developers who let Eclipse put that number there do not understand how to manage it themselves, which is what declaring it means you have to do.

If you start passing incompatable classes with the same serialVersionUID, your team is going to loose it when it tries to debug that one.

2 Comments Add yours

  1. Bill says:

    This is frankly awful advice. The purpose of the serialVersionUID is to create backward compatibility when marshalling objects over the wire. If you are in an enterprise cluster where there are multiple paths to make backend connections between applications, then you want to make certain that you don’t need to “forklift-upgrade” all applications at once just because one of your core domain object definitions have changed. Why not? Why doesn’t everyone use the same set of tools and the same servers and JDKs etc?

    There are many considerations in an enterprise development environment and most of them have to do with money. Development time is *EXPENSIVE*. Preparing for a production roll-out is *EXPENSIVE*. And that’s for a single component. Rolling out multiple applications scales as O(n^2) from applications to cost.

    And so what Java has very nicely provided for us since Java 1.1 is a facility where an object can be reserialized into a somewhat different class that it was instantiated with. Result? Only the application under development needs to change unless the class that is being changed has breaking changes (ie has methods or attributes removed). Old applications stay unchanged, developers have less stress, and management has a tiny hope of not losing more cash than they take in.

    So, do you need to maintain the serialVersionUID? Yes. But you don’t have to do it often. Most changes to domain classes are additive. I’ve rarely ever had to adjust this.

    Eclipse is automatically calculating the version number as it would be done by the JVM at runtime and prepopulating this field. You are correct that the number is fairly arbitrary. It is indeed ugly, but the reason it does this is in case you are working with an existing copy of this class; you will need to populate with this number to keep compatibility.

    Some shops standardize everything at serialVersionUID=1. I personally don’t like this approach because it gives you very little indication of why anything is at any given version. I prefer to use a timestamp, like 201002041647. This is the “earliest date of compatibility” and provides a mechanism where an architect (like me) can quickly scan source control logs to see why a number is there, or when it might have changed for what purpose.

    1. helloscriptkitty says:

      Bill,

      Thanks for the comments, but I don’t agree that it is awful advice. I stand by the statement that the way Eclipse starts each class with a serialVersionUID is not beneficial for new Java developers.

      Most people defend serialVersionUID for either for the myth of performance or flexibility in versioning. Your defense falls into the later camp.

      I agree with you that someone who understands how Java serialization works can use serialVersionUID to create more flexible code. They can update clients and servers at different rates and not worry about class cast or incompatible versions. But, this is not what Eclipse does. It auto generates the serialVersionUID, which is similar to leaving that to Java. You are not controlling the versions, you are letting Eclipse do it for you.

      The giant long that Eclipse uses is not conducive to human management. We both seem to be on the same page there. You mention that you like to use the date of last change. I tend to like version numbers, so I can set common objects to common numbers. For example, all my Transfer Objects will be in the 1000 range for a particular version of software.

      If you let Eclipse control serialVersionUID, as a new developer, you are not managing your serialization versions to be flexible. You are HOPING that different compiles will work with each other. You are hoping that you only made compatible changes between versions and that the numbers still match. Many developers, new and experienced, don’t really grok what makes versions incompatible. Many iterate be default, which also defeats the purpose.

      Depending upon you view of what “somewhat different” includes:
      * Deleting fields
      * Changing the package
      * Changing nonstatic fields to static
      * Changing nonstatic fields to transient
      * Changing the declared type of a primitive field
      * Removing the implementation of either Serializable to Externalizable
      * Changing a class from an enum to a non-enum or visa-versa.
      * Changing the writeObject or readObject method if it makes the binaries different
      * Changing, adding, or removing writeReplace or readResolve method, if the behavior would produce incompatable versions.

      The crux of my advice is if you are going to own serialVersionUID, you need to own all of it. Letting Eclipse doing it for you, and their numbering system, are doing you no favors.

      Now more controversially, I also stand behind my stance that serialVersionUID is a hack; and I would say especially in expensive enterprise systems. If you are running some sort of distributed processing at the enterprise scale, you need one of two things:
      1) a proper RPC system that is more robust that simple Java RMI
      or
      2) a way to key your systems in synch.

      Number 2 is only feasible if you own all the systems. If you are in some sort of environment where you can control all of your VMs, there are a number of really great solutions for keeping your instances in sync. You might think of it as a “forklift-upgrade,” but class incompatibilities are the least of your worries if you are running multiple versions of the same app.

      The other scenario is that you have a small set of common Transfer Objects and your are serializing them amongst a bunch of different apps or VMs that you don’t directly control. In these cases I would look at alternative RPC/Serialization schemes. In most enterprise level systems you end up dealing with non-Java clients and services. You will need a serialization and RPC mechanism that they can all use.

      There are a lot of non-language specific serialization schemes like JSON and SOAP. If flexibility is important, and client upgrades costly, then they are probably worth going the step beyond just letting Eclipse set your serialVersionUID. Enterprise systems need to handle many different clients using multiple version of their service APIs. Enterprise systems have to deal with legacy clients that are using outdated systems without having to leave old version of services running for a few but crucial clients. If you are at the enterprise scale, then you are beyond what Java envisioned with its serialization standards. This makes sense, because why would Java want to force everyone to the complexities of multi-versioned service oriented architecture when most developers are dealing with more simple client-server models.

      My observation is that the land between small controlled systems where you can insure everyone is on the same version and true enterprise systems which are simply beyond Java serialization is a narrow strip of land. If you are crossing into it, you are probably going to cross beyond it. Playing with serialVersionUID may ease some headaches in the land of big client-server or small enterprise, but it won’t take you to running an enterprise system like Amazon or Google which abandoned binary serialization for services oriented architectures.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s