So far in our series of articles, we have been discussing relatively generic SQL Server 2000 security related issues that every database administrator is likely to encounter on a daily basis. Now it is time to look into less common scenarios that deal with one specific area of management - database replication. We will start by presenting its architecture and implementation options, and follow with their security implications.
The main purpose of replication is data distribution. Placing multiple copies of the same data at different locations allows you to load balance or reduce network traffic generated by remote database clients, merge data from multple sources into one comprehensive target, or separate databases based on their purpose (e.g. OLTP access can be isolated from read only queries by creating two data sources, with the read only one being periodically updated). Copying critical data to a secure location also serves frequently as the basis for disaster recovery or high availability solutions.
There are different variations of replication, which differ in scope of data (from stored procedures or user defined functions to multiple tables), direction of data transfer (one or two directional, with merge functionality and conflict resolution provisions), propagation method (push or pull), and update mechanism (more about this shortly). In order to make principles of replication easier to understand, SQL Server designers decided to apply terminology derived from the publishing industry when naming its main components:
- publisher - SQL Server instance that serves as a source of replicated data,
- distributor -SQL Server instance which manages and monitors data flow from a publisher to subscribers, using the distribution database created specifically for this purpose,
- subscriber - SQL Server instance functioning as the target for replicated data,
- article of publication - the smallest unit of replicated data, consisting of a table, its subset formed by either selected columns (filtered vertically), rows (filtered horizontally), or combination of both, a view, user-defined function, or stored procedure.
- publication - collection of one or more articles grouped in order to be replicated together.
Keep in mind that the three architectural elements of replication - publisher, distributor and subscriber - are, in essence, roles that a particular SQL Server can serve. These roles are not mutually exclusive - there are situations where the same system functions as a publisher and a distributor or as a publisher and a subscriber within the same instance of replication. Similarly, with multiple replication instances, a system can be a subscriber for one, and a distributor for another.
There are three types of replication:
- snapshot - is the simplest type from the conceptual and implementation perspective. It involves capturing content of a publication at a point in time and copying it to subscribers. This content consists of scripts to create database objects included in a publication that are subsequently executed on the subscriber, as well as data residing in these objects. Its main benefit is simplified maintenance and troubleshooting as well as decreased load on replication participants, since they are not responsible for keeping track of database changes in between snapshots. On the other hand, lack of transactional support is also the main drawback of snapshot replication. Since incremental changes to a publisher are not reflected in the subscriber until the next snapshot, data between the two tend to be out of synch most of the time. In addition, size of replicated data is the same size as the size of the publication, every time the snapshot gets replicated - affecting network bandwidth and processor utilization across all replication participants. This is the main reason that snapshot replication is typically used with small and relatively static publications. There is no mechanism to propagate updates on subscribers back to the publisher.
- transactional - overcomes the limitations of snapshot replication by keeping track of incremental changes to a publication (taking advantage of database transaction logs), propagating them to distributor, which in turn distributes them to subscribers, shortly after they take effect (note that transactional replication starts with a single snapshot, which provides initial synchronization between publisher and subscribers). This ensures that subscribers represent an up-to-date state of publisher (degree of synchronization between the two is configurable). At the same rate, though, this places an additional load on publisher, distributor, and subscribers, and increases complexity of configuration and management. This type of replication is used in scenarios where data on subscribers needs to remain consistent with the original publication. Subscribers are typically treated as read only, although there are (relatively limited) configuration options (immediate updating subscribers and queued updating subscribers) allowing updates applied to subscribers to replicate back to the publisher. In general, though, it is recommended to handle updateable subscribers by configuring merge replication (presented next).
- merge - intended for scenarios where content of a publication can be updated at multiple locations. Initial snapshot is used to synchronize remote subscribers with a publisher, but from this point on, the distinction between them becomes less clear. Changes can occur at multiple locations (all subscribers and publisher), which introduces a number of issues that need to be properly addressed, such as mutual synchronization and conflict resolution. In addition, merge replication offers some unique enhancements, such as dynamic filtering (which filters replicated data based on parameters provided dynamically by subscribers). This type of replication is used in situations that involve remote offices or mobile users working with subsets of a publication, which are rolled back into the publisher.
Actual work involved in replication is performed by a number of automated processes, which are implemented as SQL Server Agent jobs, created automatically when you set up replication via SQL Server Enterprise Manager. This implies that they run in the security context of the SQL Server Agent account (on Windows NT and later operating systems) or a logged on user (on Widows 9x computers). These jobs fall into several distinct categories, known as replication agents, each based on a separate executable from Program Files\Microsoft SQL Server\80\COM folder or a stored procedure from the distribution database, and delivering different type of functionality. The primary ones are as follows:
- snapshot agent (SNAPSHOT.EXE) - running on distributor, it is responsible for creating a snapshot of a publication. The snapshot is then copied to a shared folder on distributor, which subsequently is copied to subscribers with the help of distribution agent (refer to the next item below). This agent is used by every type of replication, since each one involves the creation of a snapshot at least once.
- distribution agent (DISTRIB.EXE) - copies and applies the initial snapshot created by the snapshot agent, and, if appropriate, transactional changes, prepared by the log reader agent (discussed in the next item below), from distributor to subscribers. It is used by snapshot and transactional replication. You can configure either one shared agent serving all subscribers running on distributor (push subscription) or individual agents operating locally on each subscriber (pull subscription).
- log reader agent (LOGREAD.EXE) - running on distributor, log reader agent pulls entries from the transaction log on publisher (the ones marked for replication), making them available to the distribution agent. It is used only in transactional replication.
- merge agent (REPLMERG.EXE) - specific to merge replication, synchronizes publication content between publisher and its subscribers. Note that, in addition to snapshot agent, this is the only one that merge replication relies on (providing combined functionality of log reader and distribution agents used with transactional replication). Merge agent can reside either on a publisher (push subscriptions) or on subscribers (pull subscriptions).
- queue reader agent (QRDRSVC.EXE) - is optional in transactional replication (not used for the other two types), applicable only when queued updating subscribers in transactional replication are configured (in cases where transactional changes to subscribers need to be replicated back to the publisher).
There are also several other replication agents whose roles are secondary, performing mostly cleanup and monitoring tasks, such as agent history clean up agent, distribution clean up agent, expired subscription clean up agent, reinitialize subscriptions having data validation failures agent, or replication agent checkup agent, which are not relevant to our security-focused discussion, so we will not go beyond just listing them here.
This introductory information will serve as the basis for a discussion on replication security, in our next article.