About a year ago, IBM Fellow Jeff Nick received an email that changed his life.

Deep within IBM's server group, Nick was doing advanced work, attempting to develop a distributed systems integration architecture that would help IBM integrate its various hardware and operating system platforms. As System 390's chief architect, Nick had been instrumental in bringing Linux onto IBM's mainframe, so he knew about radical ideas and systems architecture. His aim was to develop a layer of open interfaces, internally called the Open Services Architecture (OSA), that application and middleware developers could write to without worrying about system details. The idea, says Nick, was to "leverage the same technologies for systems integration that were being leveraged for application integration"—technologies like the Java 2 Platform, Enterprise Edition (J2EE), the Internet, and emerging Web services protocols.

The email came from the general manager of IBM's Internet division, Irving Wladawsky-Berger, the man who laid out IBM's e-commerce strategy in the mid-1990s and then led Big Blue's Linux initiative a few years later. Wladawsky-Berger had read a whitepaper, "The Anatomy of the Grid." Written by researchers at the University of Southern California, the University of Chicago, and Argonne National Laboratory, it was the academic community's first attempt to structure the myriad grid-related projects it used to run supercomputing-type applications across a variety of different networks and to show how these technologies might be useful outside the scientific and technical worlds.

Wladawsky-Berger's message was intriguing: "If I didn't know better, I'd have sworn you wrote this," he wrote to Nick.

Jeff Nick had heard something about grid computing. He knew about a few high-profile distributed computing applications like SETI@home, but the work that these developers were doing was much more sophisticated—and a lot closer to what he was trying to achieve with his Open Services Architecture. (See sidebar, "Grid Jargon," for more grid computing terms.)

Welcome to the grid

Beginning in 1995, with an experimental network called the I-Way, Argonne's Ian Foster and Steve Tuecke, along with Carl Kesselman from the University of Southern California had begun laying the foundations of the Globus Toolkit. Right now, the Globus Toolkit is based on a grab bag of different protocols (lightweight access directory protocol (LDAP), FTP, a custom HTTP messaging protocol, for example), but its next release, version 3.0, will be based on an emerging group of distributed computing standards called the Open Grid Services Architecture (OGSA). Globus can be used for job submission and management, data movement, discovery, and security on a computing grid; it is to the OGSA standards what Apache is to HTTP—a production-ready, high-quality, open source reference implementation of the specs. To date, the Globus Toolkit is mainly used in the scientific and research communities. It's the technology behind the TeraGrid and the Department of Energy's Science Grid, for example, but commercial vendors, including IBM, Sun Microsystems, BEA Systems, Hewlett-Packard, and Microsoft are all becoming involved. Last spring, a startup called Butterfly.net launched the very first commercial grid application: a grid for online video game hosters.

In July 2001, Nick's team got in touch with the folks at Argonne, and by September, the two teams had met. The grid developers taught IBM about their grid work, and IBM taught the grid-heads about J2EE.

Java had been a large component of grid development since before the advent of OGSA. In 1997, grid developers began work on a project called Java Grande, which improved Java's support for scientific calculations and explored ways Java could be used to create interfaces to computing grids. According to Dennis Gannon, a professor of computer science at Indiana University and a member of Java Grande, the first major toolkit for grid access was the Java Commodity Grid (Java COG) toolkit. "This is currently the most widely used way to access the current (pre-OGSA) grid services," Gannon says.

"J2EE really didn't enter the picture until IBM came on the scene," says Tuecke, whose coworker, Gregor von Laszewski, wrote Java COG. At the time, Tuecke says, he only had a "general idea" of what J2EE was: "I knew that there was this thing out there called J2EE and that businesses seemed to like it."

Web services on the grid

A few months before IBM's call, Tuecke began to seriously contemplate the role Web services could play in the Globus Toolkit. In May 2001, he and his boss, Ian Foster, hashed things out in Foster's University of Chicago office. Tuecke had written an internal spec that described how emerging Web services standards might solve some of the problems they had with grid computing.

Each of the Globus components was built upon its own protocol stack—it used LDAP for discovery, a custom HTTP-based protocol for job management, and FTP for high-performance data transfer. "As we wanted to do more and more of the interesting high-level stuff that simultaneously used more services," Tuecke says, "we found that this heterogeneity in our base protocols was hurting us. We knew that we would benefit significantly from a common way to do feature discovery, submit requests, register for event notification, and do lifetime management."

This is what they saw in Web Services Description Language (WSDL). "The real crux of what we care about in Web services is WSDL—having an abstract language for defining your messaging infrastructure, with the ability to bind it down to multiple protocols," Tuecke says. In other words, the Web services approach offered them an interface description language (WSDL) that, unlike a CORBA-based approach, didn't really care about the programming environment. Grid services could be defined with WSDL, but they could be implemented in a variety of different environments or languages—J2EE or .Net, for example.

When IBM came into the picture, Tuecke realized that J2EE could be that container environment. "What we really wanted was stateful objects running in some app server with some container environment around it that would handle a lot of the management details of that instance. And some of those things we wanted in the container were an awful lot like what's in J2EE containers...[sometimes] you want all the persistence capabilities and scalability you get out of app servers."

After hashing things out with the grid developers in the fall, IBM funded work to build a grid services architecture on top of the Apache Axis Web services toolkit. The Argonne developers worked on an open source JBoss implementation of J2EE OGSA bindings, while IBM developers did the same thing for WebSphere. Both groups worked with the Axis team to tie the Axis servlet engine into their app servers so that Axis could dispatch Web services requests into J2EE entity beans or stateful session beans. They also spent time extending and debugging Axis's support for literal encoding and for XML Schema Document (XSD).

Nick says that within a year, his company will "provide extensions to our tooling suite [WebSphere Studio] so that application and middleware providers can write OGSA-compliant services." The open source JBoss work will be publicly available as part of Globus Toolkit 3.0, which is expected by year's end, according to Tuecke.

Grid skeptics

But what about the other J2EE vendors? Though all the big enterprise vendors are showing up at the Global Grid Forum, the standards body where the OGSA is being defined, Sun and BEA are much cooler on the subject than IBM.

In fact, BEA is downright frosty. "OGSA may, in fact, formalize some APIs to bring up and down servers," says BEA Senior Director of Technology Benjamin Renaud, but he says he's "not optimistic" about OGSA because it aims to standardize some things—clustering architectures, for example—that vendors like BEA would prefer to keep proprietary. "Enterprise applications run with data," he continues. "Data is not something that people want to go off and spread around."

Nick disagrees. He says grid computing's great appeal to the enterprise will be precisely this ability to spread data. "It's about resource provisioning and resource allocation," he says. "Peak utilization demands on an IT infrastructure tend to be easily 10 times that of steady state," and so companies are left with no choice but to over-provision for their normal workloads, while at the same time under-provision for peak demands. "And they hope they can react quickly enough by physically moving resources from another place to accommodate the spike when it occurs."

Grid boosters say this kind of dynamic provisioning, if it does happen in the enterprise, will start with small projects, but they hope that eventually companies will be able to outsource storage and processing power in much the same way they outsource electricity and Internet bandwidth.

But caution remains the byword. "Distributed computing in general has been proven, but there are not a lot of mission-critical applications of it," says Delphi Group Chief Analyst Nathaniel Palmer. "There is not a lot of practical work that I've seen on the development side."

As for Sun, Grid Marketing Manager John Tollefsrud says his company tracks OGSA developments, but that "it isn't ready to build on yet." He admits Sun is "acutely interested" in the new Java working group formed at July's Global Grid Forum in Edinburgh, Scotland, and says that Sun would like to see the work being done on Java bindings to OGSA be "aligned with the Java Community Process."

"There's a bunch of hype going on here, and there's some good work," Tollefsrud says. "It's going to be a while before we have useful things that convey real value to users."

IBM's Nick sees things otherwise. He says that just as Linux had its skeptics in the early days, so too does grid computing. However, he adds, "I am convinced that grid is a dimension of distributed computing, and distributed computing is ubiquitous and pervasive...The applications that are there today may be niche, but the infrastructure model is not about niche; it's about distributed computing, and I think it's going to become much more important in commercial companies."

Robert McMillan is Linux Magazine's editor at large.

Learn more about this topic