In the previous article of this series, we started reviewing high availability and scalability enhancements in SQL Server 2005 (based on its recent Beta 2 release) by discussing database mirroring. In this article, we will continue exploring the same area of functionality, shifting our attention to failover clustering. We will also provide comparison of both features, which should help in recognizing the types of scenarios for which each of them is better suited.
Failover clustering has been supported since SQL Server 7.0 so it is not surprising that its latest implementation is a combination of already familiar (and stable) technology with a number of useful improvements. As before, the solution is based on Microsoft Cluster Services (MSCS), which, starting with Windows 2000 Server became an integral part of Advanced (or Enterprise, in case of Windows 2003 Server) and DataCenter editions. In essence, clustering provides high-availability by combining several physical servers (referred to as nodes) into a single virtual unit, typically by connecting them to a shared set of disk devices. Disks are attached to each server via Fibre Channel or SCSI links (although iSCSI-based configurations become increasingly popular), but access to them is arbitrated so no two servers are permitted to write to (or read from) the same disks simultaneously (in order to prevent data corruption). In the most common scenario, all servers in a cluster are configured the same way (from both a hardware and software perspective), which makes them capable of running the same applications. Clustered software must be installed by running the setup program in a special manner, which allows the application state to be easily transferred across cluster members. This is typically accomplished by separating application binaries, placed on local drives of each individual node, from shared data (such as clustered databases and their transaction logs) residing on shared disks (which implies, by the way, that both local and shared logical drives should be assigned the same drive letters on each node). There is only a single instance of a particular application running at any given time within a cluster (although you can have multiple, distinct instances of different applications operating in parallel). The burden of executing the processes of any of clustered application is handled only by one of the nodes (however, in the case of several clustered applications, they are typically distributed evenly across all nodes, in order to take full advantage of processing resources). If a server hosting the application instance fails, processing tasks are automatically transferred to one of remaining cluster members, which is also given exclusive access to the shared disk hosting this application data. Except for a short failover period, the application continues running, without data loss.
Application operational requirements are satisfied by creating virtual cluster resources, such as disks, IP addresses, network names, services, file shares, etc. (the complete list is much longer). Each set of related resources (typically the ones associated with the same application) is placed into its own resource group, which constitutes a failover unit; (all resources within it are typically interdependent and they all need to failover together in order for a corresponding application to remain functional). If a resource group is accessible as a network entity (which typically implies that it must have at least an IP address, a network name, and a disk among its resources), then it is known as a virtual server. An example of a virtual server could be a single SQL Server 2005 instance, with its associated name, IP address, disks where databases and transaction logs are stored, and services, such as SQL Server, SQL Server Agent, and Full-Text Search (FTS) Service.
The number of nodes supported in the clustered configuration of SQL Server 2005 is dependent on the operating system limits. You can have up to 8 cluster members when using 32-bit version of Windows 2003 Server Datacenter Edition (up to 4 with 64-bit version) and up to 4 cluster members with Windows 2003 Server Enterprise Edition (Windows 2000 Advanced and Datacenter Server SP4 support up to 2 and 4 nodes in a cluster, respectively). Note, that hardware requirements in this case are dictated largely by clustering technology (rather then by SQL Server 2005-specific demands). In order for the installation to be supported by Microsoft, underlying hardware must comply with Windows 2000 Hardware Compatibility List or with the Microsoft Windows 2003 Catalog, (you need to ensure that the hardware is listed in the cluster solution category, since compliance of its individual components does not suffice).
There are some caveats pertinent to setting up SQL Server 2005 on a cluster (besides the already described issues, which apply to operating any cluster-based application, such as satisfying physical and operating system requirements or creating and configuring resources and virtual servers). First, you need to install and configure the Microsoft Distributed Transaction Coordinator (MSDTC) component (responsible for managing distributed queries, two-phase commit transactions, and some of the replication tasks). The steps necessary to accomplish this task on Windows 2003 platform are documented in the Microsoft Knowledge Base Article Q301600 (and the Windows 2000-based procedure is described in the article Q290624. Note that it is recommended to place MSDTC and its associated resources in their own resource group. Before launching the clustered installation of SQL Server 2005, ensure that you are logged on with an account that is not only a member of the local Administrators group on each cluster node, but also has rights to log on as a service and act as a part of operating system. The setup program is able to detect that it is running on a clustered server and indicates this by enabling "Install as virtual server" options for SQL Server and Analysis Services components on the "Components to Install" page of the SQL Server Installation Wizard (make sure that you select these options when installing a clustered instance). When specifying target disks where database and transaction logs will be placed, choose them from the available cluster groups (you need to pick the ones where appropriate disk resources are located). Remember that each instance name must be unique across the entire cluster. Ensure also that the MSCS Cluster Service account is a member of SysAdmin role on clustered instances of SQL Server 2005 (this requirement is satisfied by default because SysAdmin includes BUILTIN/Administrators group). Finally, there are additional considerations that you should keep in mind when incorporating clustered SQL Server 2005 installations as part of your replication environment. In such cases, it is important (from the availability perspective) to create a clustered file share resource and designate it as the default snapshot folder for clustered Distributor or as an alternate snapshot location for individual publications (this is done as part of the publication definition from the Publication Properties dialog box). This provides Subscribers access to snapshots in case of individual server failures.
Scalability of clustered SQL Server is limited due to a couple of factors. First, there is only a single copy of each instance of SQL Server, with its data residing on shared disks (this also has availability implications - even though the disks are typically highly redundant and reliable, failure of a storage unit hosting them results in application downtime). In addition, in your design, you need to take into consideration the worst-case scenario, in which the majority of servers fail. In this case, the remaining ones must be capable of handling the increased level of utilization, which under normal conditions is evenly distributed across the entire cluster. These factors, combined with the failover strategy, affect the number of cluster nodes and their configuration (it is possible to designate specific nodes as potential or preferred owners of particular resource groups, while prohibiting others from participating in their failover).
While the goal of database mirroring and failover clustering is to provide high availability through automatic failover without data loss, their characteristics (and, consequently, the manner in which they accomplish this goal) are very different. When deciding on which one better suits your requirements, you should consider the following factors:
- hardware and software requirements - clustering has much more stringent requirements, since every piece of hardware needs to be included in the Windows 2000 Hardware Compatibility Listing or Microsoft Windows 2003 Catalog (in addition, it is recommended that each cluster member is set up identically). Clustered disk devices operate typically in either shared SCSI of Fibre Channel configuration (iSCSI-based storage emerges as an attractive alternative). Servers must be running Windows 2003 Enterprise (or Windows 2000 Advanced) or higher. While definite information regarding SQL Server 2005 versions that will support clustering is not available at this point, historically it was necessary to purchase Enterprise Edition in order to be able to use it for clustered environments. On the other hand, database mirroring does not introduce any additional hardware restrictions, beyond the ones expected for standard SQL Server 2005 installations (although it does increase storage needs - you need to approximately double the amount of space occupied by your highly-available databases, to store their mirrored copies).
- price - The increased level of hardware and software requirements (outlined above) of failover clustering introduces a significant increase in price comparing with database mirroring. While you might need additional disk space when using database mirroring, this should not have considerable impact on the price (disk space is relatively inexpensive), with with exception of very large databases.
- distance limitations - The maximum distance covered by failover clustering depends on the type of technology used for attaching shared storage devices. In extreme cases (shared SCSI channel), the limitations are very stringent, restricting distance between cluster nodes to no more than a few meters. While Fibre-Channel-based technology significantly increases this range (with distances being measured in miles), it still might not be suitable in geographically distributed environments, where high level of availability is required. Database mirroring, on the other hand, needs only fast and reliable network connectivity (note, however, that iSCSI-based clustering can match this level of efficiency).
- level of resiliency - in case of failover clustering, there is only one instance of each database and its transaction logs, residing on shared storage devices. While such devices are fault tolerant (typically through various levels of hardware-based RAID), only high-end solutions (which offer their own independent replication mechanisms) can deliver high-availability on the site level (where even a failure of the entire location still provides data access). Unfortunately, such solutions impose significant extra cost. Database mirroring, on the other hand, allows setting up a copy of a live database in either a local or remote location, which becomes available automatically in case of the failure of the primary, including site-wide disasters.
- implementation and management complexity - clustering technology is considerably more difficult to implement, manage, and troubleshoot.
- server-level vs. database-level failover - one of the benefits of clustering results from the fact that it fails over on the virtual server level. This eliminates some of the issues inherent to database mirroring, which we mentioned in the previous article (such as a need for synchronizing mappings between SQL Server logins and database users on the mirror or the lack of ability to protect system databases). Furthermore, failover is completely transparent to clients, regardless of the client version or content of connection string.
- uptime expectations - mirroring offers faster failover than failover clustering (since the failover takes place on the database level, rather than for entire the SQL Server instance). The actual values depend on a number of factors (such as status of transactions on all active databases), but they would typically be within a range of few seconds in case of database mirroring and about one order larger in case of virtual server failover.
In the next article of our series, we will continue our discussion on high availability and scalability features of SQL Server 2005, such as online restore and indexing.