Globule User Manual: 3 Server Configuration

3 Server Configuration

Much like Apache needs to be configured on which web-sites it needs to serve, Globule as a module to Apache, also needs to be told which parts of the sites served by Apache need to be replicated. Likewise instructions on security, configuration and special handling need to be selected. Globule adds another dimension because it allows tuning of replication and redirection policies and it is a co-operative network. This means that one explicitly selects partners with which to co-operate and replicate documents to and from.

Globule therefore requires configuration, as does Apache. Like other modules in Apache, this configuration is embedded in the Apache configuration file httpd.conf. Without configuration Globule/Apache can possibly be started, but is dysfunctional.

As Apache configuration can be quite complex to get right. This documentation does not handle the configuration of Apache itself, nor of any modules which can be used inside Apache. Refer to the Apache documentation and follow the guidelines in the sample httpd.conf or httpd-std.conf to get the a working web-site first before integrating Globule. Globule also provides a sample httpd-globule.conf that can be renamed to httpd.conf which can be used to start your configuration from.

This section describes how to prepare a configuration in httpd.conf which performs a basic replication of a site to other host. Separate subsections handle individual subjects and enhancements like:

Site replication;
DNS based redirection;
System Monitoring;
Dynamic Content.

The reference in section 5 describes the directives on an individual basis rather than per subject.

The Globule Broker

Setting up a configuration file httpd.conf can be quite a difficult process. Order in which directives are specified matters, their semantical nesting must be precise, when to add port numbers and many other common tasks. Globule adds another dimension to managing the httpd.conf since the configuration of one server which is the origin of exported documents is linked to replica servers which import the documents. The locations, shared secrets and settings need to match between servers.

To aid users in setting up httpd.conf configuration files for their servers and set up relationships between origin sites and friendly replica servers, we have created a web-site which:

brokers between potential replica servers and your origin server;
generates a complete and working httpd.conf configuration file based on all your settings.

This web-site is the Globule Broker Service (GBS). Globule users are able to register their servers, to select on which server(s) their sites should be replicated, how redirection should be performed, etc. As an added feature, Globule will provide a set of servers ready to replicate its users' sites, as well as a public redirection service. The GBS can be found at http://www.globeworld.net/. Note however that its features are currently quite limited. A redesign of the GBS is on its way.

3.1 Basic Server Configuration

Globule is provided as a module for Apache. This requires that you have to let Apache know that you will be needing the Globule module. Such instructions, as well as other configuration directives are written in the Apache configuration file httpd.conf. Where this file is located depends on the installation you have chosen. In this file also directives will be placed that provide instructions to Globule on how to operate.

Apache is a highly configurable and flexible server. This also means that even the basic configuration without Globule is quite extensive and many details matter. Be aware that small configuration changes can have large effects. Small omissions, presence of other directives or order in which directives are placed can result in Apache failing to start, misoperation, or other unexpected results. Some of these effects are even silent and the server either does not start, or seems to work, but in a different fashion (for instance, not using replication).

Therefore, take care to follow instructions precisely and make changes at the proper location. Look which values you need to change, such as adding port-numbers, setting the ServerName, and changing the directory names, etcetera. Some values, like directory names appear multiple times in configuration files, be sure they are consistent with each other.

This section describes how to add the most basic necessary directives to a functional Apache configuration file. In subsequent sections is explained how to add further functionality on a per-subject basic. This manual cannot give an overview on configuring Apache, only on the extension Globule provides. Some knowledge on Apache configuration is needed and we advice to work from a template httpd.conf as provided by your installation method.

3.1.1 How to update your configuration

Configuring Apache and Globule involves making changes to the configuration file httpd.conf. When making changes to the configuration, these will not take effect until your restart Apache. The location of the httpd.conf file and how to restart Apache depends on your installation method. Refer back to the chosen installation method on the location of httpd.conf and the preferred method of starting Apache.

In any case, you might also check whether certain errors in the configuration using the command apachectl configtest or globulectl configtest if provided. However not all configuration errors show up during startup. When Apache starts, it will run in the background. Any errors at this time will be writing in the error log as specified in the Apache configuration. Always check this error log for problems.

3.1.2 Check your Apache configuration

The installed httpd.conf might already be adapted, however this default configuration file is just a standard template and should be checked and/or adapted for your system. Refer to the Apache documentation on a full explanation. The following settings are at least important for a correct Globule or do vary much between systems. These settings should already be partially present in the httpd.conf.

Directive Listen

The Listen directive instructs Apache to listen to one or more ports. The Listen directive must always be specified, even if the default port 80 is used. At the time of release of version 1.3.1 of Globule, the usage of multiple listen ports, or the use of SSL/HTTPS may not fully functional.
Make sure that the port specified here, is in accordance with the specification on ServerName, NameVirtualHost and VirtualHost directives as GlobuleReplicaIs/For etcetera directives.

Example:

Listen 8333

Directives User and Group

When Apache is instructed to run on from port 80, it requires superuser priviledges and thus needs to be started as root. Since this can cause security issues, Apache is always instructed to try to change its identity after startup to the Unix user and group as specified by the directives User and Group. Standard Unix/Linux operation as well as the recommended Apache setup is to change to the Unix user nobody and group #-1. There are however Linux distributions which provide separate Unix users and groups such as apache, httpd, www, web, etcetera. If you run off a default distribution you might need to use these groups in order for the web-server to access all files. The Unix user/group combination nobody and #-1 are always available.

Example:

User nobody
Group #-1

For Windows users

Windows users, who use DNS redirection (their machine plays the role of the redirector need to disable the AcceptEx windows call. This Microsoft optimization breaks quite a lot of software, including our and MySQL software. Besides, enabling it provides limited performance increase. Since Windows serves pages very slow compared to Linux servers, you can safely disable this feature always:

<IfModule mpm_winnt.c>
  Win32DisableAcceptEx
  ...
</IfModule>

Locate the existing IfModule mpm_winnt section and add the Win32DisableAcceptEx directive.

Directive ServerName

The ServerName directives appears at least once in the httpd.conf at a global level, which means not inside a VirtualHost section or other. Only one such a ServerName at the global level should exist, quite early in the configuration file. The single argument to the ServerName directive should be the hostname of your machine, which will always resolve to the public IP address of the machine.

Listen 80
...
ServerName world.cs.vu.nl

If your server does not use the default HTTP port (as specified as Listen 80 earlier in the httpd.conf) then the ServerName should have a colon appended to it:

Listen 8333
... 
ServerName world.cs.vu.nl:8333

The usage of an IP number instead of a fully qualified hostname is discouraged, as the usage of VirtualHosts is not supported, nor is DNS redirection.

VirtualHost sections

The usage of VirtualHost is documented in the Apache documentation, but due to the many mistakes one can make with it, and the effect it has on Globule, some remarks on the configuration are below. i.e. when URLs with different host names return a different set of pages. You must use name-based virtual hosting in most cases, even if you only want to host a single site.

Unless you have multiple IP addresses on your machine and know what you are doing, you want name based virtual hosting instead of plain virtual hosting. In a name based configuration you should start with the specification of a NameVirtualHost directive. Then for each web-site with a different hostname to be served, define a VirtualHost directive environment. These should at least contain a ServerName directive with the web-site name and a DocumentRoot directive which specifies where the documents for that web-site should come from. Be sure that the ServerName directives within the VirtualHost environment are tagged with the port number in the same way as the global ServerName;

Listen 8333
...
ServerName world.cs.vu.nl:8333
...
DocumentRoot /var/www/html
...
NameVirtualHost *

<VirtualHost *>
  ServerName world.cs.vu.nl:8333
  DocumentRoot /var/www/html
  ...
</VirtualHost>

<VirtualHost *>
  ServerName www.revolutionware.net:8333
  DocumentRoot /var/www/www.revolutionware.net
  ...
</VirtualHost>

<VirtualHost *>
  ServerName _default_:8333
  DocumentRoot /var/www/html
  ...
</VirtualHost>

You must specify a VirtualHost section for the global ServerName too. Thus, in the example above, world.cs.vu.nl.nl is first, and global ServerName specified and must also be present in one of the VirtualHost environments (as in the first in the examples). Note that because the global ServerName and the first VirtualHost name ServerName are the same, the DocumentRoot should be the same too.

The last VirtualHost section in the example catches all incoming requests that don't resolve to any of the VirtualHost. It is common for this section to have the same DocumentRoot as the global DocumentRoot, but this is possible only if this site is not (partial) replicated.

If now, or in future you will add ServerAlias directives, then take note that you shouldn't add the port number when specifying aliases for your hosts.

For each VirtualHost with a new DocumentRoot you should also check whether the files are accessible, both by having world-accessible permission bits when running the server on an Unix machine and because the server program is allowed through it's configuration. Within the httpd.conf access is allowed or denied through the specification of Directory directives, see the next paragraph and the Apache documentation.

Directory specifications

Whenever Apache serves a document, locating and authorizing the file to be served goes through several stages. The DocumentRoot specifies the initial location, Location directives specify how to treat individual paths, but whether an actual file may be accessed is controlled by a <Directory> directive environment. A default configuration will always deny access to all files by disallowing anything for ``/'' Therefore if you add a VirtualHost and a DocumentRoot which is not yet allowed, you need to add a Directory section for it. Also if you change a DocumentRoot or ServerRoot directory, remember to check all paths in Directory environments.

Taken the example in the previous paragraph, access will only be allowed from a default location for the files being served at http://www.revolutionware.net:8333/ if we add to the httpd.conf:

<Directory "/var/www/www.revolutionware.net">
    Options Indexes FollowSymLinks
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

This configuration snippet should be stated just below a <Directory /> specification normally present in your configuration, but at least before any VirtualHost specification.

3.1.3 Add Globule support

This subsection describes how add Globule to a working non-Globule Apache configuration, however with no web-site being replicated or imported from another origin server.

Add a LoadModule directive for Globule

First Apache must be instructed to use the Globule module by adding a line which loads the module:

LoadModule globule_module modules/mod_globule.so

This LoadModule directive should be placed below the other already present LoadModule directives. These normally occur early in the configuration after the MPM specific section.

Add Directive GlobuleAdminURL

Globule will not work unless it has some web address through which it can talk to itself. This schizophrenic notion is necessary because Apache isn't a single program, but when started Apache splits off in multiple processes. A reserved URL lets Globule do it's internal book keeping. Using the GlobuleAdminURL directive you can provide Globule with a URL into your web-server that can freely be used by Globule.

A good choice for the site-name is the first, global ServerName that appears is your configuration and use a path like globuleadm. Following the earlier examples this would result in the specification of:

GlobuleAdminURL http://world.cs.vu.nl:8333/globuleadm/

Note that;

The URL that you provide must be fully qualified path, including the http:// and hostname and port part (for which the global ServerName is a good choice);
Any path you will give, like in the example /globuleadm/ will do;
The GlobuleAdminURL must end with a slash;
The address to which the URL points should not contain any actual content, nor any sub-path of it. It should also not be replicated.

This with the exception of the supporting files for the monitoring (see section 3.4). These files must be actually installed at the filesystem location pointed to by the GlobuleAdminURL.

The GlobuleAdminUrl directive is normally placed directly after the global DocumentRoot and at least below the first, global ServerName and Globule's LoadModule directive.

Prevent unwanted entries in your access log

Globule relies on a number of periodic tasks executed roughly every second (e.g., to check is a given file was modified or if a replica server is still alive). These tasks usually perform an internal HTTP request to your own server. As a result, your logs/access_log file will quickly get filled up with records of these internal requests. There is enough of them to fill up any hard drive within a matter of days or weeks.

All internal Globule requests use either the custom-created SIGNAL or the REPORT HTTP method. To filter these requests out of your log files, we recommend that you enter in your httpd.conf an equivalent of the following lines:

SetEnvIf Request_Method "SIGNAL" dontlog 
SetEnvIf Request_Method "REPORT" dontlog
CustomLog logs/access_log combined env=!dontlog

The order of these statements is relevant. In your httpd.conf there should already be one or more CustomLog directives, where the first should be defined at a global level (i.e. not inside an environment like VirtualHost) almost directly after several LogFormats are defined. The SetEnvIf entries should be defined in between these two. Then all occurrences of CustomLog should have env=!dontlog appended to them.¹

3.2 Site Replication

Globule's main feature is to replicate Web sites. This section will explain you how to configure Globule so that documents from a given web site are replicated (i.e., copied) across multiple servers and maintained consistent (i.e., updated when the origin version is updated).

Each Web site must have one origin server, which holds the authoritative version of the documents. It can be replicated across any number of backup servers and replica servers. To establish replication from an origin server to a replica server, or from an origin server to a backup server, both servers need to be configured appropriately:

The origin server needs to know where its replica/backup server is. This is done using the GlobuleReplicaIs or GlobuleBackupIs directive.
The replica/backup server needs to know where its origin server is. This is done using the GlobuleReplicaFor or GlobuleBackupFor directive.
Both servers need to authenticate each other by using a shared password (i.e., they both need to know the same password).
If the same site has one or more backup server and one or more replica server at the same time, then replica servers need to know where the backup servers are. This is done using the GlobuleBackupForIs directive.

Whenever a browsing user on the Internet surfs to the web-site being replicated, one of the replica servers or the origin server is selected to handle the request. If a replica server is selected, the browser is redirected to the replica server. The most accessible form of redirection is HTTP redirection. HTTP redirection is easier to understand and set up, but has some disadvantages over DNS based redirection. After you understand HTTP redirection you can turn to section 3.3 for DNS based redirection.

Replicating a site with HTTP redirection

We will go through the configuration of a web-site replicated across one origin and one replica server. Later we will add a backup server which acts as a fall-back when the origin isn't available for replica servers to fetch fresh copies of web pages.

- In this example we assume that you have a computer with hostname world.cs.vu.nl and that you have a web-site http://www.revolutionware.net being served from this computer.
- Your friend provides you with the ability to use his web-server on his machine wereld.cs.vu.nl as a replica. At this web-server, your pages will be replicated at the URL: http://wereld.cs.vu.nl:8080/worldpages/

Note that the web-servers run at different port numbers (yours on the default port 80, the server of your friend at port 8080. With HTTP redirection any combination of ports is possible.

As an example of a document being replicated consider the photo image file available at http://www.revolutionware.net/photo.jpg. This will be copied and made available at http://wereld.cs.vu.nl:8080/worldpages/photo.jpg

To replicate your site www.revolutionware.net you must modify your configuration to something like:

Listen 80
ServerName world.cs.vu.nl
...
LoadModule globule_module modules/mod_globule.so
GlobuleAdminURL http://world.cs.vu.nl/globuleadm/
...
NameVirtualHost *
...
<VirtualHost *>
  ServerName www.revolutionware.net
  DocumentRoot /var/www/html/pages
  <Location "/">
    GlobuleReplicate on
    GlobuleReplicaIs http://wereld.cs.vu.nl:8080/worldpages/  coffee
  </Location>
</VirtualHost>

This configuration shows the ServerName, GlobuleAdminURL, etcetera laid out in a manner described in section 3.1.2. It then resumes with defining the www.revolutionware.net virtual host section and the documents for this web-site which will be replicated are to be placed in /var/www/html/pages.²

The actual replication is performed by two directives GlobuleReplicate and GlobuleReplicaIs. Both must be defined inside a Location environment which determines from which path the documents will be replicated. In this case the path is anything from / and all sub-paths, in other words: the entire web-site.

GlobuleReplicate on

The GlobuleReplicate declares that the web-site must be replicated and that this server will act in the role of origin for the web-site. Because the GlobuleReplicate directive is placed inside a Location directive, the URL path from which to start to replicate is determined from this Location environment.

You can also turn redirection partially off for a web-site. Turning off replication is described in 3.2.2.

GlobuleReplicaIs...

One or multiple GlobuleReplicaIs then declare the replica server(s) to which to replicate the web-site to.

You an your friend need to agree upon an URL path you are exporting (assumed until now to be http://www.revolutionware.net/) and a URL path on which your friend will be importing your web-pages (assumed until now to be http://wereld.cs.vu.nl:8080/worldpages/).

You also need to agree upon a shared secret; a password known by both your origin server and your friends replica server and used for inter-server authorization. In the above configuration the phrase ``coffee'' was chosen.

Now your server is configured, but your friend needs to update his configuration as well.

Listen 8080
ServerName wereld.cs.vu.nl
...
DocumentRoot /var/www/html
...
LoadModule globule_module modules/mod_globule.so
GlobuleAdminURL http://wereld.cs.vu.nl:8080/globuleadm/
...
NameVirtualHost *
...
<VirtualHost *>
  ServerName wereld.cs.vu.nl
  DocumentRoot /var/www/html
  <Location "/worldpages/">
    GlobuleReplicaFor http://www.revolutionware.net/  coffee
  </Location>
</VirtualHost>

This configuration has one Globule-specific directive; namely the GlobuleReplicaFor directive which specifies that your friends server will act within the role of a replica server for your (as specified in the argument of GlobuleReplicaFor) server. The GlobuleReplicaFor also needs to be located inside a Location directive to indicate to globule at which path your web-site should be available.

Your friend has a mirror configuration that you have. The ServerName and Location in which your friends GlobuleReplicaFor is form the URL as specified by your GlobuleReplicaIs. Vice versa, the ServerName and Location in which your GlobuleReplicate/GlobuleReplicaIs is placed form the URL as specified in the argument to GlobuleReplicaFor.

3.2.1 Using a backup

Whenever a replica copy of a document is not available or no longer valid at a replica server, it will fetch a fresh copy of the page from the origin server. This way replica servers will keep up-to-date. However it can be that the origin server is not available at the time.

To this end, backup servers may be defined. The role of these servers it to maintain a complete set of documents for the replicated web-site. They obtain this set of pages from the origin server through the same method as normal replica servers, but just make sure they keep a valid copy at all times. Replica servers can thus fetch a copy of a web-page from the origin server, but if unavailable also from a backup server. ³.

Since the operation of a backup server is largely the same as a replica server, the configuration follows the same line, with three exceptions:

instead of using GlobuleReplicaIs and GlobuleReplicaFor use the directives GlobuleBackupIs and GlobuleBackupFor;
the normal replicas need to define which alternative backup servers there are when the regular origin isn't available, which will be done using the specification of a GlobuleBackupForIs;
finally the backup-servers need to be told to always keep the documents, by specifying a suitable replication policy with the GlobuleDefaultReplicationPolicy directive.

We will run through the modifications in the origin server and replica server and how the backup server should be configured. We assume you have another friend with the machine monde.cs.vu.nl which offers to be your backup-server, then in your configuration of the origin site add the GlobuleBackupFor directive:

Listen 80
ServerName world.cs.vu.nl
...
<VirtualHost *>
  ServerName www.revolutionware.net
  DocumentRoot /var/www/html/pages
  <Location "/">
    GlobuleReplicate on
    GlobuleDefaultReplicationPolicy Invalidate
    GlobuleReplicaIs http://wereld.cs.vu.nl:8080/worldpages/  coffee
    GlobuleBackupIs  http://monde.cs.vu.nl:8333/worldpages/   tea
  </Location>
</VirtualHost>

Clearly, backup servers are almost the same as regular replica servers for the redirector. The main change is that all regular replica servers need to be explicitly told there is a redirector available for this site:

Listen 8080
ServerName wereld.cs.vu.nl
...
<VirtualHost *>
  ServerName wereld.cs.vu.nl
  DocumentRoot /var/www/html
  <Location "/worldpages/">
    GlobuleReplicaFor http://www.revolutionware.net/  coffee
    GlobuleBackupForIs http://www.revolutionware.net/ http://monde.cs.vu.nl:8333/worldpages/
  </Location>
</VirtualHost>

Note that the usage of the GlobuleBackupForIs is with two arguments, first arguments specifies for which site we are defining a backup (GlobuleBackupForIs), the second argument specifies who the backup server is (GlobuleBackupForIs). No password needs to be defined; the first argument must always be the same as specified in GlobuleReplicaFor.

Finally the backup server of your other friend needs to setup his configuration, which is almost the same as setting up a replica, but you should also add a GlobuleDefaultReplicationPolicy and use GlobuleBackupIs. GlobuleBackupIs⁴.

Listen 8080
ServerName wereld.cs.vu.nl
...
DocumentRoot /var/www/html
...
LoadModule globule_module modules/mod_globule.so
GlobuleAdminURL http://wereld.cs.vu.nl:8080/globuleadm/
...
NameVirtualHost *
...
<VirtualHost *>
  ServerName wereld.cs.vu.nl
  DocumentRoot /var/www/html
  <Location "/worldpages/">
    GlobuleDefaultReplicationPolicy Ttl
    GlobuleBackupFor http://www.revolutionware.net/  tea
  </Location>
</VirtualHost>

3.2.2 Replicating a partial site

Globule allows you to easily define parts of your site that should not be replicated. The origin server will simply not redirect clients to replica servers, but only the the original, origin server for the paths selected not to be replicated.

<VirtualHost *:8333>
  ServerName www.revolutionware.net:8333
  DocumentRoot ...
  <Location "/">
    GlobuleReplicate on
    GlobuleReplicaIs ...
    GlobuleBackupIs  ...
  </Location>
  <Location "/cgi-bin/">
    GlobuleReplicate off
  </Location>
</VirtualHost>

This instructs Globule to replicate the web-site with the URL http://www.revolutionware.net:8333/ except the pages that are in the sub-path http://www.revolutionware.net:8333/cgi-bin/.

When using HTTP redirection, another way to replicate only parts of a site is to insert the GlobuleReplicate, GlobuleReplicaIs and GlobuleBackupIs directives inside a <Location> container with a sub-path of /:

<VirtualHost *:8333>
  ServerName www.revolutionware.net:8333
  DocumentRoot ...
  <Location "/replicate_me/">
    GlobuleReplicate on
    GlobuleReplicaIs ...
    GlobuleBackupIs  ...
  </Location>
</VirtualHost>

3.3 Client Redirection using DNS

3.3.1 What is DNS redirection?

Until now, all configurations shown in this documentation use a redirection mechanism called HTTP redirection. This means that, when an origin Web server receives a request, it can reply by ordering the browser to re-issue the same request at a different server. This scheme is extremely simple, but it has two major drawbacks. First, as the browser is effectively returned a modified URL, it can decide to store that URL for future reference. As a consequence, removing or replacing a replica may render various cached URLs invalid. Second, each request is still initially posted to the origin server, so the success of the request depends on the availability of the origin.

DNS redirection addresses these problems by basing redirection on a web site's name. For example, when a browser queries ``http://www.revolutionware.net/'', it first resolves the server name ``www.revolutionware.net''. In a non-replicated setup, the browser would always receive the IP address of the server to contact. Using DNS redirection, the DNS redirector will check where the client is located and return the IP address of the most suitable server out of the available replica servers for the site. IP addresses are usually not shown to the users, so DNS redirection is invisible to them.

DNS redirection imposes a few restrictions:

Redirection can only be realized for a Web site as a whole, so everything from the location /. It is impossible to replicate only a part of a site.
All servers taking part in the replication of the Web site must run on the same port number.
Running a DNS redirector requires that Apache is started as root.
You must control the DNS domain inside which you want to run your web-site. For example, if you want to have your site available under the URL http://www.revolutionware.net/ then you must own the domain revolutionware.net. If you do not already own a domain, then any registrar will let you register one for a modest yearly fee for the .com, .net and .org and some more top-levels. Other top levels, such as .nl are available through local registrars.

Alternatively, if one of your friends already owns a DNS domain (for instance revolutionware.net), then she may delegate a sub-domain (for instance berry.revolutionware.net) to you so that you can for example create a site called http://www.berry.revolutionware.net or even
http://berry.revolutionware.net.

3.3.2 Required elements to setup DNS redirection in Globule

The Apache installation of the origin server must be compiled with the patch provided by Globule. This is done by default when using the automated installer, otherwise refer to section 2.3.
You must setup a DNS server that will contain all informations about the domain. How to install a DNS server is unfortunately relatively complex, and outside the scope of this document. We refer the reader to a good DNS tutorial, or to this famous book on the topic. Alternatively, most good registrars offer a service where they run DNS servers for you, and simply ask you to provide the information that must be kept there. We strongly recommend readers to select a registrar which provides this service, such as Gandi and GoDaddy amongst many others.

3.3.3 Setting up DNS entries for redirection

Let's assume that you own the domain revolutionware.net and that you want to setup DNS redirection for the site http://www.revolutionware.net/. In a non-distributed setup, the name www.revolutionware.net would simply be an alias for the actual server's host name. In a Globule setup, www.revolutionware.net will point to different machines when being looked up by different clients. We call www.revolutionware.net the generic name of the site, which represents all machines collectively. Additionally, each server taking part in the replication needs a specific name of its own that will be used when Globule needs to contact one specific server within the replicated site⁵. It is not a problem to give multiple names to the same machine, so even if these machines already have names (e.g., ``wereld.cs.vu.nl''), you should create additional generic and specific names just for the sake of the Web site.

Imagine that you have two machines called ``wereld.cs.vu.nl'' and ``world.cs.vu.nl'', which you want to perform the role of origin server and replica server respectively. Let's assign them the specific names origin.revolutionware.net and replica.revolutionware.net respectively. The following lines should be inserted in your DNS zone⁶:

$ORIGIN revolutionware.net.
origin   IN  CNAME  wereld.cs.vu.nl.
replica  IN  CNAME  world.cs.vu.nl.

Do not forget the dots at the ends of the lines!

Alternatively, if you know the IP addresses of your servers (e.g., 130.37.198.252 and 130.37.193.70), then you may define your zone as follows to provide minor performance and reliability improvements:

$ORIGIN revolutionware.net.
origin   IN  A  130.37.198.252
replica  IN  A  130.37.193.70

Note that A records do not end with a dot.

You must now define the generic name www.revolutionware.net where your site will be located. We do not want to associate a specific IP address to this name, but instead let Globule's DNS redirector decide which IP address should be returned to clients who lookup that name. In the setup we are creating, the origin server will also be the DNS redirector, so you need to insert this in the DNS (it is not possible to use an IP address here instead of the name origin.revolutionware.net):

www  IN  NS  origin.revolutionware.net.

Be warned that any change in the DNS records may take a few hours before being ready for use. If your DNS-redirected site does not work as expected and you see errors like ``www.revolutionware.net not found'', this probably means that you should be patient and wait for changes to be fully propagated.

3.3.4 Configuring Globule for DNS redirection

You must now configure the origin and the replica server so that they support DNS redirection.

Two modifications are needed compared to a non-replicated setup:

The origin server must be told to act as a DNS redirector.
The origin and replica servers must be configured to respond to the newly-defined generic and specific DNS names.

A normal origin server configuration without DNS redirection, based on the machine hostname wereld.cs.vu.nl and the site www.revolutionware.net, would look similar to:

  ...
  ServerName wereld.cs.vu.nl
  ...
  GlobuleAdminURL http://wereld.cs.vu.nl/globulectl
  ...
  NameVirtualHost *

  <VirtualHost *>
    ServerName www.revolutionware.net
    DocumentRoot ...
    <Location />
      GlobuleReplicate on
      GlobuleReplicaIs ...
  ...

Note that the sections separated by vertical dots (:) appear at different points in the configuration file. This order matters, especially the VirtualHost which needs to be at the end of the configuration file.

First, let's enable DNS redirection at the origin server. This is done using the GlobuleRedirectionMode directive. At the global level you need to add or modify the redirection mode into GlobuleDefaultRedirection BOTH, enabling both HTTP and DNS redirection for the server as a whole.
Then, inside each VirtualHost section which specifies an origin of a Globule-replicated site, you must declare whether to use HTTP redirection or DNS redirection only.

Having done that, you only need to specify that your site can be reached both as http://www.revolutionware.net/ and http://origin.revolutionware.net/.

Here is the resulting configuration file:

  ...
  ServerName wereld.cs.vu.nl
  ...
  GlobuleAdminURL http://wereld.cs.vu.nl/globulectl
  GlobuleRedirectionMode BOTH
  ...
  NameVirtualHost *

  <VirtualHost *>
    ServerName origin.revolutionware.net
    ServerAlias www.revolutionware.net
    GlobuleRedirectionMode DNS
    DocumentRoot ...
    <Location />
      GlobuleReplicate on
      GlobuleReplicaIs http://replica.revolutionware.net/  sharedpassword
  ...

It is important that the ServerName entry contains the specific server name (origin.revolutionware.net), and that the generic server name (www.revolutionware.net) appears as the first entry of the ServerAlias directive. Specific names should be used in other directives such as GlobuleReplicaIs and GlobuleBackupIs.

You must also update the replica server's configuration file to specify that the replica of the http://www.revolutionware.net/ site can also be reached using it's location-specific address http://replica.revolutionware.net/.

  ServerName world.cs.vu.nl
  ...
  GlobuleAdminURL http://world.cs.vu.nl/globulectl/
  ...
  NameVirtualHost *
  
  <VirtualHost *>
    ServerName replica.revolutionware.net
    ServerAlias www.revolutionware.net
    DocumentRoot ...
    <Location />
      GlobuleReplicaFor  http://origin.revolutionware.net/  sharedpassword
    </Location>
  </VirtualHost>

You can now start the two servers. Do not forget to run them as root, as regular users normally cannot run DNS redirectors! Your site should now be available at URL http://www.revolutionware.net/.

3.3.5 Testing DNS redirection

With DNS redirection, the identity of the server which served your requests will not be shown to you. You may then start wondering if redirection actually works, or if all requests will end up being served by a single server.

Most Linux distributions contain the utility ``dig'' which is used to query DNS servers by hand. If you do not find it, it is usually part of an RPM package called bind-utils.

Start by testing your DNS domain:

Type:

dig -t NS revolutionware.net

The result looks something like:

; <<>> DiG 9.2.4 <<>> -t NS revolutionware.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43750
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;revolutionware.net.         IN  NS

;; ANSWER SECTION:
revolutionware.net.   86400  IN  NS  NAME-OF-YOUR-DNS-SERVER1.com.
revolutionware.net.   86400  IN  NS  NAME-OF-YOUR-DNS-SERVER2.com.

;; Query time: 1 msec
;; SERVER: 130.37.20.3#53(130.37.20.3)
;; WHEN: Thu Nov 10 15:18:18 2005
;; MSG SIZE  rcvd: 66

In the ``answer section'' you should see at least two lines with the names you the DNS servers responsible for your domain. If you used the services of your registrar to hold informations about your domain, then both servers should probably belong to it.

Now, test the names that you have created:

dig origin.revolutionware.net

; <<>> DiG 9.2.4 <<>> origin.revolutionware.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50422
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;origin.revolutionware.net.      IN   A

;; ANSWER SECTION:
origin.revolutionware.net.  430  IN   A    130.37.199.101

;; AUTHORITY SECTION:
revolutionware.net.         430  IN   NS   NAME-OF-YOUR-DNS-SERVER1.com. 

;; Query time: 3 msec
;; SERVER: 130.37.20.3#53(130.37.20.3)
;; WHEN: Thu Nov 10 15:31:30 2005
;; MSG SIZE  rcvd: 66

In the ``answer section'' you should see the IP address of your origin server. Do the same to test the name replica.revolutionware.net.

Now, let's test if the redirector is correctly registered:

dig -t NS www.revolutionware.net

; <<>> DiG 9.2.4 <<>> -t NS www.revolutionware.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55825
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.revolutionware.net.         IN   NS

;; AUTHORITY SECTION:
www.revolutionware.net.     600  IN   NS   origin.revolutionware.net.

;; Query time: 0 msec
;; SERVER: 130.37.193.66#53(goupil)
;; WHEN: Thu Nov 10 15:34:50 2005
;; MSG SIZE  rcvd: 62

The authority section should contain a line ending up with
NS origin.revolutionware.net.
Finally, let's test if the DNS redirector works:

dig @origin.revolutionware.net www.revolutionware.net

; <<>> DiG 9.2.4 <<>> @origin.revolutionware.net www.revolutionware.net
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61015
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.revolutionware.net.         IN   A

;; ANSWER SECTION:
www.revolutionware.net.     10   IN   A    130.37.199.101

;; AUTHORITY SECTION:
www.revolutionware.net.     0    IN   NS   origin.revolutionware.net.

;; Query time: 1 msec
;; SERVER: 130.37.198.252#53(origin.revolutionware.net)
;; WHEN: Thu Nov 10 15:38:04 2005
;; MSG SIZE  rcvd: 78

In the ``answer section'' you should see the IP address of one of your servers. Issue the same command several times, you should receive a different IP address each time.

3.3.6 Advanced usage

Using a backup server

A backup server adds virtually no additional complexity to the setup. Like using replica.revolutionware.net as the DNS name for a plain replica, we can use a separate name for a replica which performs the role of a backup server. Suppose we add backup.revolutionware.net to the DNS, which is some alias name for a server which will play the role of the backup server. Then the origin of www.revolutionware.net will declare:

    ServerName origin.revolutionware.net
    ServerAlias www.revolutionware.net
    <Location />
      GlobuleReplicate on
      GlobuleReplicaIs http://replica.revolutionware.net/  sharedpassword
      GlobuleBackupIs  http://backup.revolutionware.net/   wachtwoord
    ...

The backup server will be the same as any other replica server, but instead of using GlobuleReplicaFor it will use the directive GlobuleBackupFor and use backup.revolutionware.net as ServerName and www.revolutionware.net as ServerAlias. Likewise the replica servers should use the name backup.revolutionware.net in their declaration of a GlobuleBackupForIs directive:

    ServerName replica.revolutionware.net
    ServerAlias www.revolutionware.net
    <Location />
      GlobuleReplicaFor  http://origin.revolutionware.net/  sharedpassword
      GlobuleBackupForIs http://origin.revolutionware.net/  http://backup.revolutionware.net/
    ...

Not running DNS redirection on port 53 for testing purposes

Globule will bind itself to port 53 for answering DNS queries. This port number is the only port normally used by browsers to resolve the hostnames in URLs. However if you want to just test DNS redirection you can resolve hostnames using the dig program. Using the -p option you can instruct dig to contact the name server at a different port, however you should contact the machine serving the request directly also so you need to @hostname construct. For instance:

dig -p 5353 @wereld.cs.vu.nl www.revolutionware.net

Would instruct dig to ask the name server running on the machine wereld.cs.vu.nl at port 5353 to resolve the name www.revolutionware.net.

Globule can be instructed to resolve DNS queries on another port as port 53 using the GlobuleDNSRedirectionAddress directive:

GlobuleDNSRedirectionAddress :5353

The GlobuleDNSRedirectionAddress directive needs to be specified before any GlobuleRedirectionMode directive.

3.4 System Monitoring

Globule is more complex than a regular Apache server. As it is inherently distributed, information about it is spread over multiple machines which bare complex relationships. One of the goals of Globule is performance and reliability increase, but evaluation is less straight forward because of the distributed system. In case of unexpected behaviour the cause of this is harder to trace. Globule has a monitoring framework which allows to gain more insight behaviour of a Globule replicates web-site.

Typically an administrator wants to monitor a running service, which we define as the ability to:

Find the reason behind any current fault or apparent incorrect operation, such as the inability of Globule to use a replica server and redirect to it;
View the impending failure, whether the server is becoming overloaded or other exceptional information;
Record resource usage for accounting purposes;
Use resource usage and visit rate to evaluate how well the web-server performs. Specifically, view the benefits the benefits Globule brings;
Interact with the tunable parameter of the site-operation such that an optimum performance can be reached;
Gather statistical information about the visitors of the web-site for external purposes such as generating a report for marketing;
Have fun watching the server doing its work, otherwise a background task like a web-server is a nearly invisible entity.

To address these needs, Globule has an interface for these forms of monitoring controls:

log a history of regular operations, web-page accesses in this case;
view and modify tunable parameters;
view the current state;
view a history of exceptional events (such as errors, warnings, but also for instance increases in resource usage).

Apache itself provides two logging files which provide some means of monitoring. One is the access-log, which contains a listing of all URLs which have been requested from the web-site. The other logging file is the error-log, which contains error messages ranging in severity from critical, through normal warnings and informational messages. The amount of current state that can be monitored is very minimal, only server-info and server-status module provide some information and are rarely used.

The access- and error-log contain only a bit of monitoring data, which is also unstructured and limited in information. Therefore Globule also provides monitoring information which is more suited for a distributed setup, is extendible and has more advantages. It is however very useful to have the standard error and access log interface for two reasons:

The error log in certain cases is the only way in which errors can be reported back to the administrator of the web-server;
Standard utilities and analysis software reuse the default Apache access log (and to a lesser extent the error log) in their operation.

Globule therefore provides three main access points for monitoring. First, errors, warnings and some other messages are written to the default Apache error log. Second, an equivalence for the access log is produced. The third monitoring access is specific to Globule. To make it as accessible as possible, detailed Globule information is made available through a web-interface.

The usage of these three are now viewed individually in the next subsections.

3.4.1 Error log

Each Apache server maintains one or more error-log file(s) where information, warnings and error messages are written.

The error log is not Globule specific and therefore also other modules use the same error log file to write down messages. Its purpose is primary to log messages which hamper the correct or intended working of the web-server after the web-server has been started.
Such messages are written into the error-log as indicated in the httpd.conf configuration file, as Apache is a server program. Services run in the background without ever contacting the user directly.

A standard error log file is normally defined naming either error_log or error.log and placed into the ServerRoot/logs directory.

Similar to what Apache itself does, Globule associates different levels of significance to messages it generates. This allows the administrator to select which messages should be written into the log or processed otherwise. Globule error, warning and informational messages are not marked any differently from any other messages. Next to the LogLevel directive, however, there is another Globule-specific directive that controls how verbose Globule is in reporting events. This because within a running Globule enabled server you want to be able to increase the verbosity for certain types of events when finding faults. The directive GlobuleDebugProfile sets the initial verbosity of Globule.

Only one GlobuleDebugProfile directive can be and should be used, which takes global effect over the web-sites. A common use it to set it at a default level using:

GlobuleDebugProfile default

This will keep any messages of level ``error'' or above passing through to the Apache logging method. Other profiles available at this time are:

default significant error messages are logged

defaults same as default

extended errors and exceptional situations are logged,

this will cause periodically logging even if idle

verbose more verbose logging of events
These levels relate to the LogLevel ``warn'' and ``info'', but Globule may provide specific filters to specific classes of events at runtime.

For a correctly running server, informational and warning messages generated by Globule may be accessed through the web interface discussed later too, but the error-log is the only means for Apache/Globule to report situations in which the server is failing. It therefore should be inspected by the administrator of a web-site in case of problems.

Note that when configuring Apache you may:

Denote separate error log files for separate VirtualHost definitions.
Use LogLevel to suppress messages having a severity below a certain level. Note that the LogLevel directive needs to be defined before ErrorLog directive to take effect, this allows overriding the LogLevel for different ErrorLog definitions.
Not see any error messages when starting Apache, but Apache will still fail to start. Therefore you should always inspect the error-log. There are even instances where Apache will fail to start and no error messages are produced in the error-log. In these cases you want to check whether the Apache service daemon has started, named httpd.

3.4.2 Merged access log

A standard installation of Apache provides log files of all successful URL accesses to the server as defined by the CustomLog and/or AccessLog directives. The format of the AccessLog filename is referred to as a Common Log Format (CLF) which is a format shared between multiple types of web-servers. With the CustomLog format you are free to specify the format to be used, but most likely you will use an extension to the CLF known as a combined log format. In any case these log file can be global, or you can specify a separate access log for individual VirtualHost specifications.

The default access log produced by Apache is however badly suited within a setup of Globule. It namely only logs accesses to this web-server. Accesses to the same web-site but serviced by a replica web-server are logged at that other web-server. This is not the result you would want from an access log, as one is not interested in the accesses to this web-server but to this web-site. Globule solves this by merging logs of all requests to all replica web-servers serving the same web-site.

Each web-server collects data on a per-site basis regarding accesses and some other information. These partial logs are periodically shipped back, based on the interval as specified by the GlobuleHeartBeatInterval directive, through the HTTP protocol back to the origin server, which appends this to its own information. Consequently the accumulation of this data is only partially sorted in time.⁷

This combined access log not only reports on the bare accesses being made, but also some information relevant for a distributed web-site setup, such as which replica server received the request. Because of this, a file format such as the CLF is not usable and Globule uses a different format (documented in appendix B.1). One can however convert merged access logs from Globule's format into standard common log format (see Section 3.4.3).

Apart from the format, also the location where this file is stored is different. If you replicate a web-site, then Globule creates a directory named .htglobule in the directory containing the web-documents being replicated. In this directory a file report.log is created which is a log of events accumulated from all replica servers. For instance if you have the following definition in your httpd.conf:

DocumentRoot /home/www/htdocs
<Location />
GlobuleReplicate on
</Location>

Then this report-log is stored as /home/www/htdocs/.htglobule/report.log.

As mentioned in the introduction of this section there are utilities which depend on a CLF or combined log format access-log file to extract information about the usage of the web-site. Naturally you would want to be able to use any existing utilities. Therefore the globule module is accompanied with a program which transforms a report.log file into a valid access-log file in combined or CLF format. Naturally the additional information stored by Globule is lost in this translation but these would not make sense to any such software.

3.4.3 Utility program globuleutil

The globuleutil program converts one or more report-log files into a file similar in structure to a Apache common or combined log file. The output produced is written to standard output and can be either fed directly using a pipe into a web log analyzer program such as webalizer or written to a file:

globuleutil /home/www/htdocs/.htglobule/report.log > access.log

When the utility program is given multiple arguments representing multiple report-log files, they will be merged based on the timestamp in each file. Not only report-log files may be specified as input files, also regular Apache common or combined log file formats may be specified.

Since most of the time input files are not completely sorted in time, you need to either sort them beforehand, or indicate to globuleutil that the files are only partially sorted. The globuleutil utility will then allow for entries to be out of place, as long as the time difference between where the entry should have appeared in the log file based on its timestamp and the place where it actually appeared later on in the log file is no longer than n seconds away. The maximum allowed slag n is the lookahead window in time. This time difference is on a per input file basis.

If the window given is too small, an error message will be generated. When specifying a large time interval window, the globuleutil program will execute much slower and consume more memory. This trade-off depend on the settings of your web-server, the outage of replica and origin servers and the GlobuleHeartBeatInterval interval.

globuleutil usage

globuleutil [ -v ] [ -f combined | common ]
            [ -w seconds ] [ -p prefix ]
            file1...

`-h`

Output help information.

`-v`

Increases the verbosity of information such as the input file format detected, resources and interval window used, etcetera. Multiple options -v increase the verbosity level.

`-fformat` or `--format=format`

Where format it either common or combined, specifies in which Apache log style to output the result. Only the common a.k.a. CLF file format is standardized, but the combined log file is an often used Apache file format.

`-pprefix` or `--prefix=prefix`

Prepend the path prefix before each URL. The URIs in the report-log files are relative to the path imported or exported from. Full URLs are not used as the initial path can be different on the replica servers and origin server in case of HTTP redirection. Therefore you often want to prepend the path from which the documents are being exported, equal to the path in the Location directive in which the GlobuleReplicate on resides.

For DNS redirection, this would be /, which is the default.

`-w seconds` or `--lookahead-window=seconds`

Specifies the window by of time by which items in any input file may be unsorted.

3.4.4 Webalizer monitoring and the installer setup

If you have chosen for the installer procedure to install Globule, it will include the program webalizer to provide statistics about your web-site and the globuleutil program is automatically invoked when you access the web-page with the webalizer report through the globule administration URL. More on the administration URLs in the next section.

Your installation should include a script .../etc/run-webalizer.sh which tries to detect which origin site is to be updated and how to run the report.log file through globuleutil and feed the result to the webalizer statistical program. If you have different needs then you would to modify this script and the webalizer configuration file ...etc/webalizer.conf.

The webalizer reports are also kept up-to-date in this installation through a periodically run script if kept enabled in the crontab.

3.4.5 Globule monitoring web interface

Monitoring data specific to Globule can be accessed through a web-interface. A globule-enabled server provides a single address for all the web-sites within Globule's control hosted by the server, which is accessible at the URL specified by the GlobuleAdminURL directive.

A normal installation will have a default set of pages installed at this location when Globule has been compiled with the --enable-globuleadm arguments. If you installed using Globule using the automatic installer then the administration pages are always installed. They are not installed for RPM-based installations. These pages can be customized at will as they are not embedded within the server, but communicate with Globule to obtain the monitoring information.

The uncustomized pages will show a menu to the different subjects at the top of the pages. Since the pages evolve with each release this documentation does not strive to give a detailed walk-through. Rather, this documentation only explains the rough outline. The pages themselves describe their individual functionality.

What the administration pages provide is:

Generic data about which version of Globule is installed, what extensions are available (such as PHP) and how much global resources are in use.
A summary of error messages and diagnostics information.
A listing of all web-sites which are under the control of Globule at this server. This includes sites for which this server plays the role of origin, replica or redirector.

If not the full web-site is replicated, but only certain parts, it will list the from which path the site has been replicated and if within the same site (i.e. same ServerName) multiple paths are exported, they are shown individually. For this reason the web interface refers to these as sections of the server in which Globule plays a role. Additionally, a section can also be a Globule-replicated database as discussed in the section on dynamic content.

For each section defined you can browse through details such as:

The other servers which with this web-server is connected for this web-site, these are called the peers. Such as if this server is the origin of this web-site, all the servers which play the role of replica server. Of interest here is mainly if these servers are available to help your server host your web-content.
The recent accessed documents and their current status.
A report of the accesses as made by webalizer if Globule had been installed through the automated installer.

3.5 Dynamically generated content

Dynamically-generated content allows the pages of a web site to be more functional by returning content specifically of interest to the browsing user, such as the results of a search function for example. Therefore web-sites with dynamic content will and are becoming more predominant.

Dynamic content is defined as documents which are not literally stored as files, but generated as the result of a program execution each time the page is being requested by a browser. Despite their names DHTML and flash content are not dynamic content, as the same content is served to every browser. It is just displayed by the browser differently.

For a web server, delivering dynamic content is different than static content because after locating the URL-related resource it needs to invoke a program to transform the plain resource to generate the actual content to be passed to the browser. An interpreter takes the URL-related resource and executes it. This can in turn result in accessing additional resources such as files and databases before the result is passed to the browser. Globule also provides solutions for executing these web-applications distributively.

Globule enables the replication of dynamic content based on PHP scripts without any structural changes of the content. It works in the following way:

It replicates the sources used to generate dynamic content rather than replicating the generated content;
It recursively fetches other resources required by the script being interpreted to make them available at replica servers.

This is a much more advanced method of replication than mirrors or caching proxies, and much easier to convert to than complicated distributed environments. However there are some limitations of the current implementation of dynamic content replication:

It only works for PHP scripts. Other dynamic document generation techniques such as Perl and servlets are not supported;
Scripts which must undergo some small changes;
It does not support the usage of backup servers to replicate data at this time;
PHP must be configured in safe mode, and references to resources should be relative and within the exported URL path;
Changes to plain data files are currently not send back to the origin server (this may be improved in future releases);
It only supports access to the most common functions of the MySQL-style database interface in PHP.

To get replication of dynamic content operational you need to:

compile and add PHP support to Apache;
instrument your PHP pages to inform Globule about the usage of sub-resources and databases;
instruct Globule on how to contact the database in the httpd.conf configuration file.

3.5.1 Adding PHP support to Apache

With PHP, the content is generated by an interpreter program, which is a separate software which plugs into the Apache server and must therefore be installed and configured too.

If you used the automatic installer, PHP support should be present already and the httpd.conf configuration file have PHP enabled.

If you need to add PHP support or want to check whether PHP is enabled in your configuration, this section provides some guidelines on the way Globule expects PHP to be installed. Since the addition on PHP support is not directly related to Globule we refer to the official documentation for a full PHP installation reference.

Basic installation and configuration of PHP is relative simple, but since PHP can be installed and configured so diversely, be aware that incompatibility can arise when diverting from the expected installation. We therefore strongly suggest to use the all-in-one installation which provides a standard installation. The automatic installer and Globule Broker System also provide the right settings in the httpd.conf file for usage with Globule.

If you use the installer and answered ``Yes'' to include MySQL support you already have dynamic content support and you can continue with section 3.5.2 on using Globule support in PHP. If you used the installer without MySQL support, then you will be able to use PHP scripts but database drivers will not be compiled. Contact us if you need to overcome this. If you installed Globule from source, read Section 2.3.2 on how to install PHP from source.

3.5.2 Using Globule support in PHP

Globule will take care of the replication of the PHP source files to replica servers. However, the PHP programs do have to be modified and provide some additional information to Globule.

The modifications to the original PHP pages for a Globule environment have to do with telling Globule that one PHP page actually requires another PHP page, data file or database entries to be present. Globule can then also make sure these are present on the local server and point the PHP page to the right location for the specific replica server.

The modifications to your PHP pages are:

You must add the following line in the first line of all your PHP pages:
```
<?PHP eval(stripslashes($_SERVER["GLOBULE_PHPSCRIPT"])); ?>
```
For all instances of the statements require, require_once, include, include_once, etcetera wrap the argument in a call to the globule(...) function. For example:
```
require "includedpage.php";
```
must become:
```
require globule("includedpage.php");
```
If you open data files read-only, you should wrap the first argument representing the filename also in a call to the globule() function. However, do this only if this is a local file, not if the open is called with an URL.

3.5.3 MySQL query caching with Globule

In many cases, PHP pages must access a database to produce a result. In such setups, the simplest setup is to let Globule replicate the PHP code, but keep the database centralized. This setup, often called edge-side computing, may however prove quite inefficient if the performance bottleneck lies in the database. One of Globule's most innovative features allows programmers to design their PHP/MySQL applications such that database query results are cached at the replica servers. This system can greatly improve the overall system's performance [3].

Configuring Globule to cache MySQL query results requires:

to update the database-related statements in the PHP code;
and to update the Apache configuration file of the origin and replica servers.

Note that this setup currently works only for MySQL databases; also, the use of backup servers is not supported so no page can be delivered while the central database is unreachable.

Updating PHP pages

To make use of database query caching, PHP pages must be edited in the following way:

All PHP calls to the MySQL driver in the form of mysql_... must be rewritten as globule_mysql_. Thus for example:
```
mysql_connect("localhost","master","");
```
must becomes:
```
globule_mysql_connect("localhost","master","");
```
After any call that determins the database being used (i.e., globule_mysql_connect and/or globule_mysql_select_db), you must insert a call to globule_mysql_reattach. The argument in this statement is described in section 3.5.3 and represents a Globule-specific URL for the database. A good name might be db-database, where database is the database name of being connected to. For example:
```
globule_mysql_connect("localhost","master","");
globule_mysql_select_db("globecbc");
globule_mysql_reattach("db-globecbc");
```
Furthermore, when using MySQL, you must replace the usage of mysql_query with the usage of globule_mysql_execute and declare the queries being made first, as described next.

Usage of query templates

For Globule to handle cached database queries correctly, it is necessary to declare all queries before they can be use by your PHP scripts. The usage of mysql_query is therefore not directly possible. Instead, any query you want to execute first needs to be stored before it can be used. This procedure is similar to the prepared statement interface in the improved PHP MySQL interface, and many other modern database interfaces.

Instead of building the string representing the query and executing it, such as in:

for($i=0; $i<10; $i++) {
  $query = "select * from t where t.id > " + $i + " and t.rel = 4";
  mysql_query($query)
  ...

We instead will first declare a template of the query:

globule_mysql_declare("myquery","select * from t where t.id > ? and t.rel = 4");

These declare statements should be inserted after any call to the relevant globule_mysql_attach statement. The above statement declares a named statement ``myquery'', where certain parts may be filled in when the query is later executed. These yet unspecified, formal arguments are denoted with a question mark ?.

The query can then be executed, where there used to be a call to mysql_query using a call to globule_mysql_execute, which instead of using the full query, just uses the query name:

globule_mysql_execute("myquery", array($i));

The first argument represents the query name, and the second argument is an array of all values to be instantiated for the formal argument in the query template, as denoted with question marks.

Configuring Globule for Database Query Caching

Now, you also need to update the httpd.conf configuration files of your origin and replica servers.

Suppose that, before updating your PHP scripts you had the following MySQL connection sequence:

mysql_connect("localhost","master","");
mysql_select_db("globecbc");

This would make a contact to the database running on the localhost server, using username ``master'' and with an empty password using the database ``globecbc''.

To make this database reachable from the replica servers, we need to update the configuration of the origin server, such that a HTTP based interface for queries to the database:

  <VirtualHost *>
    ServerName origin.revolutionware.net
  ...
    <Location />
      GlobuleReplicate on
      GlobuleReplicaIs http://replica.revolutionware.net/  sharedpassword
  ...
    </Location>
    <Location /db-globecbc>
      GlobuleDatabase mysql://master@localhost/globecbc dbsharedpassword
    </Location>
  ...

The database identified by the URL mysql://master@localhost/globecbc indicates the same identification as used in the mysql_connect and mysql_select_db call. If the password to the database would not be empty then use a hash sign after the username in the URL, as is the standard format for URLs (e.g., mysql://master#password@localhost/globecbc).

The password dbsharedpassword does not represent database password, but a password that each replica server must know to be allowed to issue requests to the database through the origin server.

Now, replica servers can access your database via the URL http://origin.revolutionware.net/db-globecbc/. The path db-globecbc must be the same as specified in the globule_mysql_reattach statements of your PHP scripts.

If your scripts use multiple databases, then you can repeat this with different names. Make sure the same name is not used twice for different databases!

Replica servers should define a similar connection, under the same path. However, instead of specifying the URL with the actual MySQL database, the URL of the HTTP interface of the origin server is specified as such:

  <VirtualHost *>
    ServerName replica.revolutionware.net
  ...
    <Location />
      GlobuleReplicaFor http://origin.revolutionware.net/ sharedpassword
    </Location>
    <Location /db-globecbc>
      GlobuleDatabase http://origin.revolutionware.net/db-globecbc dbsharedpassword
    </Location>
  ...

There is just a single shared password amongst all replica-servers at the current implementation. The /db-globecbc location path can be freely chosen, but must match in the origin definition, replica definition and PHP script.

globule@globule.org
February 27, 2006