During the 1990s, companies bought packaged software solutions such as SAP, Oracle ERP, PeopleSoft, JDEdwards, Siebel, Clarify, and so on. Although such packaged software solutions worked well individually, they created information islands. In most cases, each system produced redundant information (like customer information). As a result, when common data changed, employees manually updated the associated information in each system, a process that quickly becames cumbersome. Eventually, some of the data across systems became inconsistent. When people noticed the resulting double data entry, inconsistent data, and data isolation problems, they decided to find ways to integrate the systems. From that search, enterprise application integration (EAI) was born.

Enterprise application integration

EAI combines separate applications into a co-operating federation of applications. Two logical integration architectures for integrating applications exist: Direct point-to-point connections and middleware-based integration.

Point-to-point integration

EAI developers pursue point-to-point integration because they find it easy to understand and quick to implement when they have just a few systems to integrate. A point-to-point integration example: One application makes direct JDBC (Java Database Connection) calls to another application's database tables. Initially, when you integrate two applications, the point-to-point integration solution seems like the right choice; however, as you integrate additional applications, you get a situation shown in Figure 1.

Figure 1. The later stages of a point-to-point integration

Considering all that, point-to-point integration's infrastructure proves brittle. Each application is tightly coupled with the other applications through their point-to-point links. Changes in one application may break the applications integrated with it. As another disadvantage, the number of integration points require support. If you have five applications integrated with each other, you will need 10 different integration points, as Figure 2 illustrates. As such, each additional application becomes harder to integrate and maintain.

Figure 2. Number of point-to-point connections

To avoid that problem, we need an intermediate layer to isolate changes in one application from the others.

Middleware-based integration

Middleware has stepped up to the task of providing a mediation point between applications. Middleware provides generic interfaces with which all integrated applications pass messages to each other. Each interface defines a business process provided by an application. Figure 3 shows a logical depiction of the service-oriented architecture using middleware.

Figure 3. Middleware-based integration

A service-oriented architecture lets you add and replace applications without affecting the other applications. If you have five applications to integrate, you'll have just five integration points. Compared to the point-to-point solution, middleware-based solutions easily support numerous integrated applications and require less maintenance. In addition, middleware can perform complex operations—transforming, aggregating, routing, separating, and converting messages—on the data passed from application to application(s). The only downside: the added initial complexity of setting up the middleware and converting existing applications to use the middleware APIs.

Integration methods

Once you have selected the logical EAI architecture, you next must choose the integration method. EAI has five common integration methods:

  • Data-level integration
  • User interface (UI)-level integration
  • Application-level integration
  • Method-level integration

Data-level integration

With data-level integration, you integrate the backend datastores that the integrated applications use. Data-level integration can be push- or pull-based. With push-based, one application makes SQL calls (through database links or stored procedures) on another application's database tables. Push-based data-level integration pushes data into another application's database. In contrast, pull-based data-level integration utilizes triggers and polling. Triggers capture changes to data and write the identifying information to interface tables. Adaptors can then poll the integrated application's interface tables and retrieve the pertinent data. You'd use pull-based data-level integration when an application requires passive notification of changes within another application's data.

Use data-level integration when the application up for integration does not provide any APIs or client interfaces, and you intimately understand how business operations affect your application's data model. Data-level integration typically represents the only option with custom applications lacking application APIs.

In data-level integration, changes propagated from dependent systems bypass the integrated application, so all inserts, updates, and deletes are done to data that the integrated application accesses. Developers typically implement data-level integration with database gateways or triggers and stored procedures. The major downside: keeping the integrated application's data intact. For example, some ERP (enterprise resource planning) systems include thousands of tables. One table might have dependencies to others, and the integrated application may be the sole enforcer of those dependencies.

UI-level integration

UI-level integration ties integration logic to user interface code. UI-level integration is scripting- or proxy-based. Scripting-based UI-level integration embeds integration code into the UI component events, common with client/server applications such as PowerBuilder or Vantive. For example, when you click the Submit button of an Add Customer screen, data must be sent to the application's database and a JMS (Java Message Service) topic. Proxy-based UI-level integration uses the integrated application's interface (through screen scraping) to pass data to and from the legacy system.

Use UI-level integration when you cannot easily or directly access the database, or when your business logic is embedded in the user interface. Mainframe and client/server applications represent typical candidates for UI-level integration. Mainframes generally cannot access friendly data stores and usually do not provide public APIs. For their part, many client/server applications embed the business logic in the client. In these cases, UI-level integration represents the only way to access and maintain data integrity.

In most cases, UI-level integration is your last resort. Adding scripting logic to catch events within client/server applications quickly becomes difficult to maintain as the integration level increases and changes occur. In either case, UI changes can break the integration triggers and logic. Moreover, tight coupling forever links UI maintenance with the integration code's maintenance.

Application-level integration

Application-level integration, probably the best way to integrate applications, uses the integrated application's integration frameworks and APIs. Application interfaces let you invoke business logic to preserve data integrity. Integration API examples include Siebel's Java DataBeans and SAP's JCA (J2EE Connector Architecture). Prefer application-level integration because it is transparent to the integrated application and preserves the application's data integrity.

Method-level integration

Method-level integration, a less frequently used superset of application-level integration, aggregates common operations on multiple applications into a single application that fronts the integrated applications.

Use method-level integration when each integrated application provides a similar set of API or functional methods. Typically, you'd create an aggregating (front) application, which fronts the aggregated applications using distributed components (CORBA, Enterprise JavaBeans (EJB), DCOM (Distributed Component Object Model), and so on). A front integration component may resemble:

// Code from a front application component
addCustomer (CustomerInfo ci) {
  ERPComponent.addCustomer(ci.getName(), ci.getAddress(), ci.getEmail());
  ECRMComponent.insertCustomer(ci.getFName(), ci.getLName(), ci.getAddress(), ci.getEmail());
}

Method-level integration requires the integrated applications to support a RPC (remote procedure call) or distributed component technology. All applications that interact with the integrated applications do so through the front application. So, if a client application wants to add a customer, it would call the front component's addCustomer() method. The component would then add the customer to the ERP and CRM (customer relationship management) systems.

Method-level integration's major disadvantage stems from the tight application coupling in front components. Changes in the integrated application API break the front application components and the applications that rely on them. Pursue method-level integration when you have CORBA-based integration technology or a distributed component technology. Because method-level integration is a more complicated form of application-level integration, it usually makes more sense to pursue application-level integration using middleware.

How to choose an integration method

Selecting the proper integration method is usually an exercise in constraint-based modeling. To wit, look at each system and define the possible interfaces into that application. In some cases, the application does not have any API; therefore the backend data store represents the only option. In other cases, APIs and a CORBA infrastructure may exist; so employ application-level integration.

Core integration components

Now that we have looked at the different integration methods, let's look at the core features and services present within most EAI solutions. The following core features and services act as the building blocks of your integration solutions.

Common XML Schema

Once you have chosen an integration method for each target application, identify a common integration XML Schema to encompass all integration objects and their associated attributes. The integration schema you develop must consider each of the integrated applications' XML Schemas and future applications' XML Schemas. Most packaged ERP and CRM applications include their own XML Schema to describe their internal business objects. You will need to convert data extracted from these applications into the integration XML format and then into the target applications' XML format. The intermediate XML format isolates changes in one application's XML Schema from other applications. If you did not use an intermediate XML format, you would have to define mappings from each application to the other applications and back.

Data transformation

Transformation converts XML or non-XML data from one format into another. Transformation, a major part of integration, is categorized into datatype and semantic transformations.

A datatype transformation converts attribute values from one application's format to another. For example, one application might use "M" and "F" to represent gender while another application uses "Male" and "Female." As another datatype example transformation, consider changing an XML element from:

<EMPLOYEE_ID>111</EMPLOYEE_ID>

to:

<EMPLOYEE_NUMBER>111</EMPLOYEE_NUMBER>

A semantic transformation, in contrast, converts the data from one format into another. For example, Oracle stores customer names in first, last, and middle initial fields. SAP, on the other hand, stores a customer name in a single field with a lastName, firstName format. Sending data from Oracle to SAP requires a semantic transformation to concatenate a last name with a comma and first name.

Data transformations are typically part of the process definition.

Page 2 of 3

Define processes

Every application has well-defined processes for completing business operations. The same is true when integrating processes across applications. You must understand the events, systems (source, intermediate, and targets), and routing requirements for the integrated applications. For example, a large company may own 12 applications that store redundant customer data. When a new customer is inserted (event) into one of the applications, you must define a process to propagate the information to the other 11 systems. Each of the 11 systems may represent customer information in different formats with different required fields. Sometimes you'll route messages through intermediate systems before ending at the target systems. You will eventually implement this information in the physical integration architecture.

Physical integration architectures

The physical integration architecture concerns the physical software components necessary to implement a middleware-based integration solution. Java-based EAI supports three architectures: message bus, centralized, and JCA.

Message bus architecture

In the message bus architecture, each application connects to a message bus (more of a glorified multicast network). Each integrated application associates with an integration node (also known as an adapter) connected to the multicast network. The integration node runs either on the same box as the integrated application or on a box with client- or API-level access to the integrated application (see Figure 4). Each integration node uses multicast to broadcast information to other integration nodes. When an application wants to share information with its peers, it sends a multicast message on the network. Each message has an associated subject. Participating integration nodes pick up the message and check whether they are interested in the subject. If they are, the integration node processes the message by invoking some business logic on the application it is integrating. The message bus architecture is common with Tibco Rendezvous.

Figure 4. Message bus architecture. Click on thumbnail to view full-size image.

Advantages:

With a message bus architecture, you can propagate a single message to multiple clients. When a multicast packet goes out, any interested application can pick up and process the packet. If multiple clients require exactly the same data, you can send the data once, and each interested application will pick up and process the information.

Disadvantages:

Message bus architectures often have security problems. It is difficult if not impossible to support authorization-based access to multicast packets. Multicast-based integration architectures drop packets on a network that any network member can see.

Message bus architectures also increase network traffic. Multicast packets negate the intelligent routing behavior of switches and bridges. Every client in the same network as the integration nodes processes multicast packets to the application layer of the open system interconnection (OSI) model. Clients then waste CPU cycles processing and discarding invalid packets.

The third disadvantage is a lack of centralized management. Each integration node must utilize some type of RAID (redundant array of inexpensive disks) configuration to support message log-file integrity—an expensive proposition.

As another disadvantage, consider the proprietary nature of message bus architectures. Currently, Tibco is the only vendor that implements the message bus architecture.

Finally, multicast packets cannot cross network boundaries without help from protocol converters. If another application to be integrated resided in a separate network or subnet, another process must pick the multicast packets, convert them to unicast packets, then forward the packets to another process in the other network. That process in the remote network converts the unicast packets back to multicast packets for the integration nodes on the remote network to receive.

Centralized architecture

As with the message bus architecture, in a centralized architecture each integrated application associates with an integration node. Each integration node serves as a JMS event listener/notifier and interface into the integrated application. JMS provides message durability, message filtering, transactions, and ensures messages are delivered and routed to targeted applications (see Figure 5). When an application must talk to other applications, it publishes messages to JMS topics. Topic listeners (message-driven beans) send the messages to business processes applying business rules, transformation, routing logic, and workflow management.

In contrast to the message bus architecture, with a centralized JMS architecture, all integration nodes communicate with a JMS server instead of a multicast network. In addition, the centralized architecture tends to be unicast-based. Unicast packets allow applications in different networks and subnets to easily talk to each other without protocol converters. Because switches and routers can intelligently route unicast packets, clients receive only packets intended for them. That also keeps hackers from sniffing network segments and obtaining integration-related information. Moreover, a centralized architecture improves overall security. Most J2EE servers support SSL (Secure Socket Layer) and authentication to access JMS services. Finally, the centralized architecture eases management by letting you manage your persistent message and transaction logs in a single location.

Figure 5. Centralized JMS architecture. Click on thumbnail to view full-size image.

Advantages:

First, a centralized architecture improves maintenance and management. In the message bus architecture, each integration node protects log files related to certified and guaranteed messaging. As a result, each message bus node ideally needs a RAID. However, the centralized architecture only needs a RAID on the JMS server.

The business rules associated with routing and transformation are also located within the centralized message service. You'll find debugging and problem resolution easier in a centralized deployment because you can review log files and look for errors in one place. The message bus architecture requires you to interrogate suspect integration nodes, coordinate information, and resolve problems in multiple locations.

The centralized architecture also better utilizes network resources. Single messages are rarely sent to multiple applications without transformation. That mitigates the benefits of the multicast protocol for integration-based applications. Switches and bridges can take advantage of intelligent routing provided by the MAC (media access control) address of a unicast frames. Network-friendly unicast packets do not require protocol converters to talk to applications in another network or subnet.

Finally, the centralized architecture supports standards-based messaging like JMS, allowing you to switch different vendor implementations and give developers a unified interface into messaging.

Disadvantages:

The centralized architecture lacks standardized integration nodes (adapters). Although an integration node standard does not exist, the popular openadaptor open source adaptor framework supports JMS, sockets, files, IBM WebSphere MQ, Tibco Rendezvous, and databases.

J2EE Connector Architecture

The JCA takes a radically different approach to integration by placing the integration components within the J2EE application server, giving you the centralized architecture's maintenance and management benefits. However, JCA adapters require remoteable APIs (see Figure 6)—classes that can invoke business logic on a remote host. Several remotable APIs examples include CORBA, EJB, DCOM, JDBC, and RPC. JCA standardizes the interfaces into enterprise information systems (EIS) so you can use a single JCA-compliant adapter in any J2EE-compatible server. JCA 1.0 well supports transactions, security, and resource management. However, because the JCA spec is at version 1.0, it has many limitations. First, it lacks an asynchronous messaging mechanism. Second, all requests are unidirectional. Third, it doesn't support event-based processing. BEA WebLogic has extended JCA 1.0 by adding support for events, but that solution is proprietary.

Figure 6. The J2EE Connector Architecture. Click on thumbnail to view full-size image.

Advantages:

The JCA boasts the same maintenance and management advantages as centralized architecture, including centralized maintenance, management, and business rules. The network-friendly JCA directly supports J2EE services just like the centralized architecture. JCA has the added advantage of standardized adapter interfaces to support multiple application servers and EISs, as well as standardized semantics for secure identity propagation, transactions, and resource pooling.

Disadvantages:

JCA 1.0's disadvantages relate to the specification's immaturity. First, JCA does not support asynchronous calls typically required in EAI solutions. Second, JCA 1.0 only supports calls made from the application server to the EIS. Last, JCA 1.0 does not define any semantics for receiving application events from EISs. JCA seems targeted towards portal-based integration with the portal driving the integration process. The JCA 1.5 spec addresses most of these concerns by adding support for JMS plug-ability, EIS event notification, and asynchronous methods.

Web service-based architecture

Although Web service-based Simple Object Access Protocol (SOAP) integration is relatively new, some claim it will replace EAI as we know it. I do not totally agree with those claims, but I do see some places where SOAP can be a useful integration tool. SOAP defines an XML-based object-invocation protocol useable over any transport protocol, usually HTTP. One of SOAP's strengths is its language independence. Application clients written in any language can invoke methods on a SOAP-based Web service as long as the method passes the correct XML. The major drawbacks: SOAP does not define the semantics for transactions, reliable delivery, or guaranteed messaging.

Let's examine a situation where SOAP fails. Let's say you wanted to use SOAP instead of JDBC to interact with a database. (Don't laugh, this has come up in discussions.) You might think that by exposing the database as a SOAP-based Web service, clients in any language can access that database. But do you think the clients will generate the raw XML messages directly? No, they will probably use an XML-based API (probably XDBC (XML Database Connectivity)). Each language will have its own XDBC API you must learn; in the end you must ask yourself what you gained?

Figure 7 illustrates a typical business-to-business (B2B) problem.

Figure 7. Typical B2B problems with multivendor solutions. Click on thumbnail to view full-size image.

In Figure 7's the B2B problem, each partner has a different B2B solution. Partner A uses WebLogic Integration (WLI). For its partners to interact with Partner A's hub, they must run a WLI spoke at their sites. Each spoke in turn must integrate with the partner site's backend applications. That process repeats for each partner's different B2B solution. As more partners are added, the integration points and software to support them becomes unwieldy. Each partner now needs system administrators who understand Vitria, webMethods, and WLI.

Figure 8, in contrast, illustrates a situation in which Web services does make sense for integration.

Figure 8. A scenario in which Web services proves effective. Click on thumbnail to view full-size image.

The situations shown in Figures 7 and 8 differ most in that in Figure 8, each partner can now run its own software without running the other partners' software. All SOAP messages are first placed on persistent queues, then an acknowledgement immediately returns. That keeps each partner loosely coupled to its partners. Message-driven beans process the messages and handle transactions, duplicate message processing, reliable delivery, and guaranteed messaging with the backend systems. At the end of every day, a record of successfully processed messages goes to partners for reconciliation.

Integration in practice

Now that you understand the different architectures, lets apply what you've learned to real-world integration problems I have faced. (The names have been changed to protect the innocent.)

Description of requirements

A major company has purchased WebLogic Integration to integrate customer information across a COBAL application, a custom in-house application, a packaged software application, and a data warehouse.

Actors and events

Any of the applications can generate events. XML will pass bidirectionally between all applications and the data warehouse:

  • The COBOL application generates a new customer file in a certain directory. A process picks up and converts the file to XML. That information must be sent to each application with the files transferred in an all-or-nothing manner. Customer information updates proceed through COBOL business interfaces accessible by JAM (Java Adapter for Mainframe).
  • The data warehouse, as an information receiver only, does not propagate information.
  • The custom and packaged applications notify the COBOL application of any customer information changes and receive new customer events.

Transformation and routing

Information from the COBOL application must convert from the COBOL file format to XML and back.

Security

All messages between systems require encryption.

Transactions

File transfers must support transactions (files must be sent in an all-or-nothing fashion) and be durable. After a file is sent to the server, it must be processed. If the server goes down, all persisted messages must be processed later.

Description of the applications:

  • Application 1: Custom COBAL CICS (customer information control system) application running on OS/390. The file-based interface uses a COBAL-specific format.
  • Application 2: Distributed application providing APIs.
  • Application 3: Custom application built in-house without APIs uses a relational database.
  • Application 4: Data warehouse that aggregates information from all systems for DSS (decision support system) and OSS (operations support system) operations.

The solution architecture:

Figure 9 illustrates the solution architecture. The dashed arrows indicate directional messaging—either JMS- or mainframe-based—over a network interface. Solid lines indicate intra-process communication.

Page 3 of 3

Figure 9. Integration in practice solution. Click on thumbnail to view full-size image.

Solution reasoning

  1. Because the customer decided to purchase WLI and build on its existing infrastructure, we cannot chose the message bus architecture.
  2. Events pass bidirectionally and the integrated applications do not support event notification through a network transport, so we cannot use JCA.
  3. Encryption supported by openadaptor pipes is required between nodes. Pipes act as filtering components between a source and sink component. Each adapter that utilizes the openadaptor framework consists of a source component, any number of filtering pipes, and a sink component. The source component listens for events from JMS topics/queues, files, sockets, database, RV messages, or custom events. Once triggered, the source component passes the information through any configured pipes. In our examples, we use pipes for encryption and decryption. When the pipes finish, the resulting information passes to the sink (destination). Again, sinks can be JMS topic/queues, files, sockets, database, RV messages, or a custom destination.
  4. JMS provides the all-or-none message transferring semantics.
  5. WLI's Data Integration Plug-In can convert non-XML formats to XML and back.

Integration made easy

In this article, I have given you a foundational understanding of EAI, EAI architectures, and EAI methods. You have examined the benefits and drawbacks of each EAI architecture, you saw the important EAI-related services, and you've seen how Java and J2EE aid integration. With this knowledge, you can now design EAI solutions with Java. However, that knowledge represents just the beginning of your journey. Good luck and stay positive.

Abraham Kang is a security architect with Jamcracker. Before joining Jamcracker, he worked in Infogain's EIS group as a J2EE and integration architect. He has helped companies like Cisco Systems define integration architectures. Kang has worked with Java for more than 5 years and with J2EE since its inception. He would like to thank his manager Steve Yu for his support and guidance, and Alexander Kuzmin and Rakesh Gupta for sharing their message bus architecture knowledge.

Learn more about this topic