Over the past several years, Sun Microsystems Architect Joshua Bloch has designed major enhancements to the Java APIs and language, specifically for the Java Collections API and the
java.math package. Most recently, he lead the expert committees that defined Java's assert and preferences facilities. In his 2001 book, Effective Java Programming Language Guide, Josh distilled his wisdom into 57 concrete guidelines for designing and implementing Java programs. JavaWorld's Bill Venners recently spoke to Josh about several aspects of API design in Java.
Why do API design?
Bill Venners: In the preface of your fine book, Effective Java Language Programming Guide, you write that you tend to think in terms of API design. I do too. If I were managing a large software project, I would want to decompose it into subsystems and have people design interfaces to those subsystems -- interfaces that would be APIs.
Considering that, how does an API-design approach contrast with the popular extreme programming approach, and to what extent should API design be the norm in software projects?
Joshua Bloch: In my experience, there is too much monolithic software construction. Someone says that he wants to design a record-oriented file system, and does it. He starts designing the record-oriented file system and sees where it leads him, rather than follow this decomposition you speak of. Decomposition into subsystems is important, but as important is to have each subsystem be a well-designed, freestanding abstraction. That's where I feel like a preacher on a soapbox.
It's much easier to avoid turning the subsystem into a reasonable component. In particular, it's easy to let reverse dependencies creep in, where you write the low-level subsystem for the use of its initial higher-level client, and you let assumptions about that client creep downwards. In the case of less experienced programmers, it's more than assumptions. You let variable names creep downwards, and you let specific artifacts of that initial client creep into the allegedly lower-level reusable component. When you finish, you don't have a reusable component, you just have a piece of a monolithic system.
You want to weaken that coupling such that the subsystem can then be reused outside of its original context. And there are all sorts of reasons for doing that, which I go over in my book. You write something for one use, and it subsequently finds its major use elsewhere. But that only works if a subsystem is a well-designed, freestanding abstraction.
Reuse: How important is it?
Venners: One of the early slogans about the object-oriented approach stated that it promoted reuse. But I think people found in practice that they didn't reuse much. Everybody needed something slightly different from what already existed, so they wrote that new thing from scratch. Perhaps things weren't designed for reuse, but, nevertheless, people still managed to build software. To what extent do you think reuse is important?
Bloch: Reuse is extremely important but difficult to achieve. You don't get it for free, but it is achievable. The projects I do here at Sun -- the Java Collections Framework,
java.math, and so on -- are reusable components. I think they have been quite successful at being used by a number of vastly different clients.
In my last job, where I built systems, I found that 75 percent of the code I wrote for any given system was reusable in other systems. To achieve that reuse level, I had to consciously design for it. I had to spend a large fraction of my time decomposing things into clean, freestanding abstractions, debugging them independently, writing unit tests, that sort of thing.
Many developers don't do those steps. You touched on it when you asked how API design contrasts with extreme programming, and what do you do if you are a manager building something. One extreme programming tenet advocates you write the simplest thing that can solve your problem. That's a fine tenet, but it's easy to misconstrue.
The extreme programming proponents don't advocate writing something that will barely work as fast as you can. They don't advise you to forgo any design. They do advocate leaving out the bells, whistles, and features you don't need and add them later, if a real need is demonstrated. And that's incredibly important, because you can always add a feature, but you can never take it out. Once a feature is there, you can't say, sorry, we screwed up, we want to take it out because other code now depends on it. People will scream. So, when in doubt, leave it out.
Extreme programming also stresses refactoring. During the refactoring process, you spend much of your time cleaning up the components and APIs, ripping things into better modules. It is critical to do this, and to stay light on your feet -- don't freeze the APIs too early. But you'll have less work to do if you design the intermodular boundaries carefully to begin with.
Bloch: Because massive refactorings prove difficult. If you built something as a monolithic system and then find you had repeated code all over the place, and you want to refactor it properly, you'll have a massive job. In contrast, if you wrote it as components, but you got some of the component boundaries a little wrong, you can tweak them easily.
I think the disconnect between extreme programming and the API-based design process I espouse is not as great as it appears. When you talk to someone like Kent Beck [author of Extreme Programming Explained (Addison-Wesley Pub Co, 1999; ISBN: 0201616416)], I think you'll find that he does much of the same stuff I do.
To get back to your first question, if you are a manager, you should certainly give your team the latitude to create a good design before they jump in and start coding. At the same time, you shouldn't let them design every bell and whistle in the world; you should ensure they design the minimal system that will do the job.
Venners: So, it sounds like the extreme programming folks recommend you do the simplest feature set that could possibly work. But don't do the quickest slop you can throw down that could possibly work.
Bloch: Precisely. In fact, people who try to "do the quickest slop you can throw down" often take longer to produce a working system than people who carefully design the components. But certainly, API design helps if you consider cost over time. If you throw down some slop, and, God forbid, the slop becomes immortalized as a public API that must be lived with for years, you really are toast. Such APIs become a tremendous support burden over time, and lead to great customer dissatisfaction.
Improve code quality
Venners: You also claim in your book that thinking in terms of APIs tends to improve code quality. Could you clarify why you think that.
Bloch: I'm talking about programming in the large here. It's relatively easy to write high-quality code if you are tackling a reasonably sized problem. If you do a good decomposition into components, you'll be able to concentrate on one thing at a time, and you'll do a better job. So doing good programming in the large leads to good programming in the small.
Moreover, modular decomposition represents a key component of software quality. If you have a tightly coupled system, when you tweak one component, the whole system breaks. If you thought in terms of APIs, the intermodular boundaries are clear, so you can maintain and improve one module without affecting the others.
Venners: Can you clarify what you mean by "programming in the large" and "programming in the small?"
Bloch: By "programming in the large," I mean tackling a big problem -- a problem too big to sit down and solve with one little freestanding program; a problem big enough that it must be broken down into sub problems. "Programming in the large" involves the complexity issues inherent in a large problem. In contrast, "programming in the small" asks: How can I best sort this array of floats?
API design and refactoring
Venners: How does focusing on API design serve you well in the refactoring process? I believe I heard you say that developers won't have to do as much refactoring.
Bloch: That's part of it. In addition, refactoring is often ex post facto API design. You look at the program and say: I have almost the same code here, here, and here. I can break this out into a module. Then you carefully design that module's API. So, whether you do it at refactoring time or up front, it's the same process.
In truth, you always do a bit of both. Programming is an iterative process. You try to do the best you can up front, but you don't really know whether you have the API right until you use it. Nobody gets it right the first time, even if they have years of experience.
Doug Lea [author of Concurrent Programming in Java (Addison-Wesley Pub Co, 1999; ISBN: 0201310090)] and I chat about this issue from time to time. We write stuff together, and, when we try to use it, things don't always work. In retrospect, we make obvious API design mistakes. Does this mean Doug and I are dumb? Not really. It's just impossible to predict exactly what the demand will be on an API until you have tried it. That's why whenever you write an interface or an abstract class, it's critical to do as many concrete implementations as possible before committing to the API. It's difficult or impossible to change it after the fact, so you better make sure it's good beforehand.
Trust versus being defensive
Venners: Now I'd like to talk about trust. To what extent should I trust client programmers to do the right thing? You write in your book about making defensive copies of objects passed to and from methods. Defensive copying is an example of not trusting clients. Is there not a robustness versus performance tradeoff to defensive copying? Indeed, if you have arbitrarily large objects, could it be expensive to defensively copy every time?
Bloch: Clearly there is a tradeoff. On the other hand, I don't believe in attacking performance problems too early. If it's not a problem, why bother attacking it? And there are other ways around the problem. One, of course, is immutability. If something can't be modified, then you don't have to copy it.
It is true you might opt in favor of performance rather than robustness and not copy something if you believe you are operating in a safe environment -- an environment where you know and trust your clients won't do the wrong thing. Certainly you'll find most C-based programs filled with comments saying: It is imperative that this method's client not modify the object after it is called, blah, blah, blah. Yes, you can successfully write code that way. Anyone who has programmed in C or C++ has done so. On the other hand, it's more difficult because you forget you cannot modify it, or you have aliasing. You don't modify it, but, oops, you've passed a reference to the same object to someone else who doesn't realize he shouldn't modify it.
All things being equal, it is easier to write correct code if you actually do the defensive copying or use immutability than if you depend on the programmer to do the right thing. So, unless you know that you have a performance need to allow this sort of error to happen, I think it's best to simply disallow it. Write the program, then see if it runs fast enough. If it doesn't, then decide whether you want to carefully relax those restrictions.
Generally speaking, you should not allow an ill-behaved client to ruin a server. You want to isolate failures from one module to the next, so that a failure in one module can't break a second module. It's a defense against intentional failures, as in hacking. And more commonly, it's a defense against sloppy programming or against bad documentation, where a user of some module doesn't understand his responsibilities in terms of modifying or not modifying some data object.
Defensive copying and the contract
Venners: If I defensively copy an object passed into, say, a constructor, should I document that defensive copying as part of the class's contract? If I don't document it, I may have the flexibility later to remove the defensive copy for a performance tweak. But if I don't document it, client programmers can't be sure the constructor will do a defensive copy. So they may do a defensive copy themselves before they pass the object to my constructor. Then we'll have two defensive copies.
Bloch: If you haven't documented it, is a client permitted to modify the parameter or isn't he? Obviously, if you have a paranoid client, he won't modify the parameter because it might hurt your module. In practice, programmers aren't that paranoid -- they do modify. All things being equal, if the documentation doesn't say you must not do this, the programmer will do it. So, you are signing on for that defensive copy whether or not you document it. And you might as well document it because then even the paranoid client will know that, yes, he has the right to do anything he wants with the input parameter.
Ideally, you should document that you are doing a defensive copy. However, if you look back at my code, you'll find that I haven't. I defend against sloppy clients, but I haven't always made this explicit in the documentation.