Apart from the well-known and generally observed Sun Microsystems package naming convention for avoiding top-level package name collisions, few programmers thoroughly understand the deceptively simple package statement. Most Java programmers think the package keyword is little more than a broadaxe means to group project classes together. Most Java programmers simply use the package feature to create one unique namespace per project. Unfortunately, this approach does not stand the test of time nor scale.

When a simplistic packaging attitude scales up to team-scale (let alone enterprise-scale) Java code repositories, it gradually and painfully becomes clear that incorrectly creating and managing your Java code repository's package hierarchy can have costly and profound code maintenance implications. Worse still, these problems grow as your codebase matures and typically infect code with total disregard for project boundaries.

Consequently, when it comes to using the package statement, a few decisions must be made correctly from day one.

In this article, I explain why many Java programmers improperly use the package keyword and show you one alternative approach that has stood the test of time.

The newbie approach of using Java packages

When you first started programming in Java, you typically did not use the package statement at all. The classic HelloWorld introduction to the language quite rightly does not use nor discuss Java packages or its package keyword in any way:

// (No package statement here!)
class HelloWorld {
   public static void main(String[] args) {
      System.out.println("Hello World");
   }
}

You simply declared your classes implicitly in the default package (the package with no name) so you could run your Java program in the least verbose way (i.e., by commanding your console):

> java HelloWorld

Luckily, Java's design wasn't crippled by this program execution convenience. Elegantly and powerfully supporting large-scale programming projects was one of the top design priorities, and this is reflected in the package language feature. Hence, the newbie approach of class declaration in the default package is not sustainable when you graduate to implementing real projects.

Class name collisions and the birth of packages

As you get more comfortable with Java, you quickly find that leaving all of your project's classes in the default package limits the practical number of classes this default package namespace can hold. For example, if your first few experimental classes were called Main or Program, and your first true project also required a Main or Program class as its main entry point, then you would have a class name collision. Either you deleted your old classes or you remembered that Java allows you to create multiple package namespaces by subdividing the global namespace into multiple package-scope spaces.

The point at which Java newcomers typically and finally see the light in regards to the package statement is when they start their second Java project and want a clean separation between their first project's classes and their second project's classes.

Soon, creating a new package for each new Java project becomes second nature. Unfortunately, many Java programmers' understanding of package's true potential stagnates at this point. But continuing to use Java's package feature in this primitive way is wholly unsatisfactory in the long term, especially where code repositories grow in size from mere molehills to mountains.

Code duplication: the big no-no

The long-term problem of simply creating a new package for each new project is code duplication. Code duplication is one of the big evils of programming because:

  • Maintenance costs can spiral out of control
  • Readability suffers
  • Code becomes bloated
  • System performance might turn sluggish

We all know the root of this problem: trademark programmer's laziness. Often we get the feeling we're working on something that we've already done or solved in some distant past, so we hunt down the old solution, copy the appropriate code (logic snippet, entire method, or (hopefully not!) entire classes), and joyfully paste this into our new project. Hence the expression cut-and-paste programming.

If you shiver at the thought of your Java code repository being littered with multiple copies of near-identical bits of logic, methods, or even classes, then you need to unleash the package statement's true power in your day-to-day Java development methodology.

The Big Bang...uh, I mean split

Let's approach the code duplication problem logically: to outlaw and eradicate all code duplication, any nontrivial piece of code should only occur once and once only. This means, among other things, that any and all

  • Generic logic
  • Generic data groupings
  • Generic methods/routines
  • Generic constants
  • Generic classes
  • Generic interfaces

should never be declared in an application-specific package.

This key observation leads us to the package organization Golden Rule Number 1:

Golden Rule Number 1
Never mix generic code with application code directly

Below your

com.company

or

org.yourorg

package level, split your package hierarchy into two fundamentally incompatible branches:

  1. The reusable items branch
  2. The project-specific branch

Application code always uses generic code (library classes and routines) but never contains such code. The opposite is true also: library code never contains any application-specific code or even application dependencies.

If you have never considered these two fundamentally different kinds of code, then you need to start thinking about this fundamental code dichotomy in your daily programming routine. It is the key to unleashing true code reuse in your organization and banishing code duplication once and for all.

This black-and-white code perspective applied to packages logically requires a topmost-level branching into a generic/reusable package master branch and a nongeneric/nonreusable (i.e., application-specific) master branch.

So for example, for the past five years, I've split my org.lv top-level Java namespace into org.lv.lego and org.lv.apps. (lv stands for nothing more exciting than my initials.) Both these fundamental top-level branches are then further subdivided into more detailed subpackages. My lego branch, for example, is currently subdivided into the following subpackages:

org.lv.lego.adt
org.lv.lego.animation
org.lv.lego.applets
org.lv.lego.beans
org.lv.lego.comms
org.lv.lego.crunch
org.lv.lego.database
org.lv.lego.files
org.lv.lego.games
org.lv.lego.graphics
org.lv.lego.gui
org.lv.lego.html
org.lv.lego.image
org.lv.lego.java
org.lv.lego.jgl
org.lv.lego.math
org.lv.lego.realtime
org.lv.lego.science
org.lv.lego.streams
org.lv.lego.text
org.lv.lego.threads

Note how most of these packages' logical content is self-evident as a result of carefully choosing appropriate and self-descriptive package subbranch names (compare to the java.* hierarchy). This is critically important in unlocking the reuse potential of reusable (generic) resources such as reusable logic, routines, constants, classes, and interfaces. Poorly named package branches, like poorly named classes/interfaces themselves, can confuse your intended user base and sabotage your resources' reuse potential.

At these deeper package levels, you again must be very careful about how you further organize your packages.

Here's Golden Rule Number 2:

Golden Rule Number 2
Keep it hierarchical

Always create a package hierarchy that has a balanced, fractal-like tree structure.

If you end up with a hierarchy that degenerates, in places, into a linear listing, then you are failing to exploit the Java package feature correctly. The classic mistake is simply listing project packages under your top-level applications package branch, my equivalent org.lv.apps. This is a mistake because a linear list of projects is not hierarchical. Linear lists are hard for human brains to grasp long term; hierarchies, on the other hand, are a natural fit for our brains' neural networks.

Projects can always be categorized by a key criterion, and this criterion or attribute should reflect in your Java package hierarchy. As an example, here's how my org.lv.apps is currently subdivided:

org.lv.apps.comms
org.lv.apps.dirs
org.lv.apps.files
org.lv.apps.games
org.lv.apps.image
org.lv.apps.java
org.lv.apps.math

Obviously your subdivisions will most likely differ from mine, but the important point is to think big and always keep future expansion in mind. Deep package hierarchies are healthy. Shallow ones are not.

Where to store all those static utility routines

Once you've accepted the logical need for two fundamentally different types of classes (generic ones and application-specific ones), you're just one step away from solving another awkward problem: where to store those oh-so-handy, but totally non-object-oriented, static utility routines.

I always despair when I see Java code that contains completely generic facets embedded in application-specific classes. Say an e-commerce application relied on a class called Customer containing, among other things, the following method:

private String surroundedBy(String string, String quote) {
   return quote + string + quote;
}

The class Customer programmer included a utility method to produce a string, which is quoted: method surroundedBy(String, String). The method is declared private, presumably because the author judged the method to be a Customer class implementation detail. Since the method is also not declared static, it apparently follows that this method is deliberately declared as an instance method. Looks perfectly benign, or is it? What is wrong with this method?

First of all, since this method is declared as an instance method, why does it not depend on any Customer object state (i.e., object fields)? It does not depend on any object fields because it does not need or use any fields; this utility method only requires its parameters and nothing else to do its job. This is the telltale logical signature of a class-independent utility method; in other words, this method is, in fact, not logically an instance method at all.

Secondly, this method also does not play any sensible part in the abstraction that class Customer should embody, so it simply does not belong in the Customer class to begin with. This method looks less benign by the minute. So what next?

The correct, by-the-book location for the surroundedBy() method is inside a class with an exclusive focus on string processing, not customerhood. Unfortunately, the String class itself is declared final and can't therefore be subclassed into a BetterString class (for example) to rehouse the surroundedBy() method in a logically justified place. A sensible alternative approach is to define a new class devoted to string processing, say class StringUtilities (or the shorter StringUtils or shorter still StringKit), and promote the surroundedBy() method by making it available as a public static routine in class StringKit, like this:

public class StringKit {
// .. Many other string-processing routines here
public static String surroundedBy(String string, String quote) {
   return quote + string + quote;
}
// .. Many other string-processing routines here
} // End of class StringKit

What do we achieve by performing this Extract Method refactoring?

In the short term, we achieve two things:

  • You make available a perfectly reusable (and therefore valuable) piece of code for future reuse across project and application boundaries
  • You improve Customer abstraction implementation by eliminating the dilution (read: pollution) of the abstraction

In the long term, the above refactoring technique results in other, possibly even more important and valuable, side effects:

  • Less new code is written (future code will call StringKit.surroundedBy())
  • Your software system's overall architecture becomes simpler to understand as more top-level logic and structure become clearer
  • Your software becomes more robust because more code relies on a library of foundation building blocks that will be tested more thoroughly and more frequently than "plain" application code

Unfortunately, real-life software is truly littered with embedded methods like surroundedBy(), yet few Java programmers reuse these methods because:

  • They are declared in application-specific code, which by definition is deemed nongeneral and therefore nonreusable
  • They are often not even visible to any programmers who may wish to reuse them because they are declared private or package-scope

One solution is to methodically identify and move such misplaced reusable methods to problem domain-specific utility classes. Read the sidebar, "Static Utility Methods Repositories, A Personal Example," for an example of how to tackle this.

Page 2 of 2

Dynamic package hierarchy

As a positive side effect of embracing code refactoring, you should also prepare to embrace a constantly evolving package hierarchy. Imagine a growing and maturing tree: as your number of Java types (classes and interfaces) grows, so does the leaf-to-branch ratio in your packages. Whenever this ratio reaches a threshold, your instinct is to relieve the weight on the branch, and create subbranches and redistribute the classes and interfaces in these new subbranches. I try to keep my types-per-package ratio very low, in the range of 7 to 10. (Compare this guideline metric with java.lang's range of 30 or more and java.util's 40 or more, depending on the API's exact version. Ever felt overwhelmed by java.util's long list of reusable types? Low types-per-package ratios prevent programmers from losing themselves in your APIs.)

The same need for package branching due to growth applies to the ratio of subpackages to parent package. If this ratio reaches a threshold, then you should instinctually reorganize the subpackages so that fewer weigh down the parent package. Keep the package tree aesthetically balanced (i.e., maintain over time its fractal-like tree structure, recalling that software is part art, part science).

By now I hear loud voices shouting, "How does a dynamic package hierarchy fit in with the need for backward-compatible library classes?" Clearly there's a conflict of interest here: libraries need to grow in a library user-friendly way. Luckily, the main invariant a library user relies on is the detailed APIs a library exports (i.e., the classes' shorthand names and the precise method signatures). Java's import language feature makes changing the source package from which a type hails less of an obstacle than it could be otherwise.

In practice, every time a class or interface moves from one package to another as a result of package hierarchy growing pains, you need to bump your library's version number and include in your release notes the incompatible name changes that were made (e.g., how Sun handled Swing's package name change some years ago). In my experience, library users gladly accept the minor short-term pain involved in occasionally changing a few import statements with the reassuring knowledge that the company's library organization is not allowed to deteriorate to the point where it becomes a liability instead of a key asset. (Nowadays, many Java tools support this view of a dynamic package hierarchy and make package name and structure changes as painless as possible).

Package-scope declarations

Tightly linked to package declarations is another Java language feature rarely properly taught by Java teachers (including in text books) or correctly assimilated by new Java programmers: the correct scoping of class members (i.e., fields, methods, constructors, and nested classes (since Java 1.1)).

Nobody has a problem with public and private scope. These scopes are fairly well understood because they are unambiguous and therefore usually appropriately used. Protected scope, on the other hand, is often thoroughly misunderstood; and when we come to package-scope, we're in a real cognitive disaster zone.

Java's designers mistakenly designed package-scope as the default scope, and consequently it is a keyword-less scope declaration (all scopes should have a keyword required, and no scope should have been the default). It's the lack of a manifest keyword that lies at the root of the problem: the majority of introductory Java books and courses leave scoping semantics and rules out of the first few chapters because correct scoping is completely unnecessary when tackling many of the classic teaching programming examples.

So we were all taught to use package-scope by default, because it conveniently lacks a keyword and therefore lets us use it without thinking about the implications (that old trademark laziness again...).

Of course, as your Java skills mature, you slowly realize that correctly scoping your class members, especially fields, methods, and constructors, is very important indeed.

Package-scope declaration for fields: almost always an error

For fields, the only scope appropriate in 90 percent of cases is private. This is a direct result of the encapsulation principle, one of the pillars of object orientation. Secondly, the consensus in object-orientation circles is that object composition is far more appropriate in most cases than subclassing, so the protected keyword should be used more infrequently than first thought. The only justifiable instance where you need to declare fields public is when you export public constants, as in:

public static final XXXX THE_CONSTANT;

The use of package-scope for fields should be viewed as equally unacceptable as declaring nonconstant fields public, because both approaches shatter the encapsulation of objects by exposing their implementation guts. And yet package-scoped fields can be seen in Java source code everywhere you look: in books, articles, newsgroup postings, and last but not least, production code. Here's just one example from the core libraries (Sun's implementation): class javax.swing.ImageIcon declares the following field:

transient int loadStatus = 0;

Notice how this declaration includes an implicit package-scope declaration, yet no other classes in package javax.swing access this field in any ImageIcon objects. Indeed, since ImageIcon also declares a public accessor method for this object state,

public int getImageLoadStatus() {
   return loadStatus;
}

it is clear that the field declaration should have specified private as the correct scope.

Package-scope declaration for methods and constructors

Compared to scoping fields, scoping methods and constructors is somewhat more complex. Both methods and constructors can legitimately be scoped using four possible scopes:

  • Public: when the whole world needs to see and access the method/constructor
  • Protected: when only subclasses need to see and access the method/constructor
  • Private: when only the declaring class needs access to the method/constructor
  • Package-scope: when other classes in the same package (this excludes subpackages) need to access the method/constructor

Clearly, if a subsystem or module lives in its own package, then package-scope declaration of methods and constructors is often a perfectly legitimate and indeed common necessity.

Good things come in small packages

The package statement is normally the first line of noncomment code in any Java source code file, yet, nearly seven years after Java's birth, many programmers still fail to exploit the substantial potential benefits of correct package keyword use. This diminutive keyword allows you to tackle overall project complexity head-on by subdividing and modularizing your system's architecture and lets you create a working long-term software process framework for code reuse. Such potential is no mean feat. So, next time you create a new package, consider the package keyword's bigger issues more thoroughly. It will be a wise, long-term investment.

Laurence Vanhelsuwé is a senior software engineer living in Scotland. With more than 20 years of programming experience—seven exclusively on Java—Laurence currently juggles his time between software development, authoring Java-related articles and books, technical editing, and Java training.

Learn more about this topic