3 Server ConfigurationMuch like Apache needs to be configured on which web-sites it needs to serve, Globule as a module to Apache, also needs to be told which parts of the sites served by Apache need to be replicated. Likewise instructions on security, configuration and special handling need to be selected. Globule adds another dimension because it allows tuning of replication and redirection policies and it is a co-operative network. This means that one explicitly selects partners with which to co-operate and replicate documents to and from.
Globule therefore requires configuration, as does Apache. Like other modules
in Apache, this configuration is embedded in the Apache configuration file
As Apache configuration can be quite complex to get right. This documentation
does not handle the configuration of Apache itself, nor of any modules which
can be used inside Apache. Refer to the
Apache documentation and
follow the guidelines in the sample
This section describes how to prepare a configuration in
The Globule Broker
Setting up a configuration file
To aid users in setting up
3.1 Basic Server Configuration
Globule is provided as a module for Apache. This requires that you have to
let Apache know that you will be needing the Globule module. Such
instructions, as well as other configuration directives are written in the
Apache configuration file Apache is a highly configurable and flexible server. This also means that even the basic configuration without Globule is quite extensive and many details matter. Be aware that small configuration changes can have large effects. Small omissions, presence of other directives or order in which directives are placed can result in Apache failing to start, misoperation, or other unexpected results. Some of these effects are even silent and the server either does not start, or seems to work, but in a different fashion (for instance, not using replication). Therefore, take care to follow instructions precisely and make changes at the proper location. Look which values you need to change, such as adding port-numbers, setting the ServerName, and changing the directory names, etcetera. Some values, like directory names appear multiple times in configuration files, be sure they are consistent with each other.
This section describes how to add the most basic necessary directives to a
functional Apache configuration file. In subsequent sections is explained how
to add further functionality on a per-subject basic. This manual cannot give
an overview on configuring Apache,
only on the extension Globule provides. Some knowledge on Apache
configuration is needed and we advice to work from a template
3.1.1 How to update your configuration
Configuring Apache and Globule involves making changes to the configuration
file httpd.conf. When making changes to the configuration, these will
not take effect until your restart Apache. The location of the
In any case, you might also check whether certain errors in the configuration
using the command 3.1.2 Check your Apache configuration
The installed httpd.conf might already be adapted, however this default
configuration file is just a standard template and should be checked and/or
adapted for your system. Refer to the Apache documentation on a full
explanation. The following settings are at least important for a correct
Globule or do vary much between systems. These settings should already be
partially present in the Directive Listen
The Listen directive instructs Apache to listen to one or more ports. The
Listen directive must always be specified, even if the default port 80 is
used. At the time of release of version 1.3.1 of Globule, the usage of
multiple listen ports, or the use of SSL/HTTPS may not fully functional. Example: Listen 8333 Directives User and Group
When Apache is instructed to run on from port 80, it requires superuser
priviledges and thus needs to be started as root. Since this can cause
security issues, Apache is always instructed to try to change its identity
after startup to the Unix user and group as specified by the directives
User and Group. Standard Unix/Linux operation as well as
the recommended Apache setup is to change to the Unix user Example: User nobody Group #-1 For Windows usersWindows users, who use DNS redirection (their machine plays the role of the redirector need to disable the AcceptEx windows call. This Microsoft optimization breaks quite a lot of software, including our and MySQL software. Besides, enabling it provides limited performance increase. Since Windows serves pages very slow compared to Linux servers, you can safely disable this feature always: <IfModule mpm_winnt.c> Win32DisableAcceptEx ... </IfModule> Locate the existing IfModule mpm_winnt section and add the Win32DisableAcceptEx directive. Directive ServerName
The ServerName directives appears at least once in the Listen 80 ... ServerName world.cs.vu.nl
If your server does not use the default HTTP port (as specified as
Listen 80 earlier in the Listen 8333 ... ServerName world.cs.vu.nl:8333 The usage of an IP number instead of a fully qualified hostname is discouraged, as the usage of VirtualHosts is not supported, nor is DNS redirection. VirtualHost sectionsThe usage of VirtualHost is documented in the Apache documentation, but due to the many mistakes one can make with it, and the effect it has on Globule, some remarks on the configuration are below. i.e. when URLs with different host names return a different set of pages. You must use name-based virtual hosting in most cases, even if you only want to host a single site. Unless you have multiple IP addresses on your machine and know what you are doing, you want name based virtual hosting instead of plain virtual hosting. In a name based configuration you should start with the specification of a NameVirtualHost directive. Then for each web-site with a different hostname to be served, define a VirtualHost directive environment. These should at least contain a ServerName directive with the web-site name and a DocumentRoot directive which specifies where the documents for that web-site should come from. Be sure that the ServerName directives within the VirtualHost environment are tagged with the port number in the same way as the global ServerName; Listen 8333 ... ServerName world.cs.vu.nl:8333 ... DocumentRoot /var/www/html ... NameVirtualHost * <VirtualHost *> ServerName world.cs.vu.nl:8333 DocumentRoot /var/www/html ... </VirtualHost> <VirtualHost *> ServerName www.revolutionware.net:8333 DocumentRoot /var/www/www.revolutionware.net ... </VirtualHost> <VirtualHost *> ServerName _default_:8333 DocumentRoot /var/www/html ... </VirtualHost> You must specify a VirtualHost section for the global ServerName too. Thus, in the example above, world.cs.vu.nl.nl is first, and global ServerName specified and must also be present in one of the VirtualHost environments (as in the first in the examples). Note that because the global ServerName and the first VirtualHost name ServerName are the same, the DocumentRoot should be the same too. The last VirtualHost section in the example catches all incoming requests that don't resolve to any of the VirtualHost. It is common for this section to have the same DocumentRoot as the global DocumentRoot, but this is possible only if this site is not (partial) replicated. If now, or in future you will add ServerAlias directives, then take note that you shouldn't add the port number when specifying aliases for your hosts.
For each VirtualHost with a new DocumentRoot you should also check whether the
files are accessible, both by having world-accessible permission bits when
running the server on an Unix machine and because the server program is
allowed through it's configuration. Within the Directory specificationsWhenever Apache serves a document, locating and authorizing the file to be served goes through several stages. The DocumentRoot specifies the initial location, Location directives specify how to treat individual paths, but whether an actual file may be accessed is controlled by a <Directory> directive environment. A default configuration will always deny access to all files by disallowing anything for ``/'' Therefore if you add a VirtualHost and a DocumentRoot which is not yet allowed, you need to add a Directory section for it. Also if you change a DocumentRoot or ServerRoot directory, remember to check all paths in Directory environments.
Taken the example in the previous paragraph, access will only be allowed from a
default location for the files being served at
http://www.revolutionware.net:8333/ if we add to the
<Directory "/var/www/www.revolutionware.net"> Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all </Directory> This configuration snippet should be stated just below a <Directory /> specification normally present in your configuration, but at least before any VirtualHost specification. 3.1.3 Add Globule supportThis subsection describes how add Globule to a working non-Globule Apache configuration, however with no web-site being replicated or imported from another origin server. Add a LoadModule directive for GlobuleFirst Apache must be instructed to use the Globule module by adding a line which loads the module: LoadModule globule_module modules/mod_globule.so This LoadModule directive should be placed below the other already present LoadModule directives. These normally occur early in the configuration after the MPM specific section. Add Directive GlobuleAdminURLGlobule will not work unless it has some web address through which it can talk to itself. This schizophrenic notion is necessary because Apache isn't a single program, but when started Apache splits off in multiple processes. A reserved URL lets Globule do it's internal book keeping. Using the GlobuleAdminURL directive you can provide Globule with a URL into your web-server that can freely be used by Globule.
A good choice for the site-name is the first, global ServerName that appears
is your configuration and use a path like GlobuleAdminURL http://world.cs.vu.nl:8333/globuleadm/ Note that;
The GlobuleAdminUrl directive is normally placed directly after the global DocumentRoot and at least below the first, global ServerName and Globule's LoadModule directive. Prevent unwanted entries in your access logGlobule relies on a number of periodic tasks executed roughly every second (e.g., to check is a given file was modified or if a replica server is still alive). These tasks usually perform an internal HTTP request to your own server. As a result, your logs/access_log file will quickly get filled up with records of these internal requests. There is enough of them to fill up any hard drive within a matter of days or weeks. All internal Globule requests use either the custom-created SIGNAL or the REPORT HTTP method. To filter these requests out of your log files, we recommend that you enter in your httpd.conf an equivalent of the following lines: SetEnvIf Request_Method "SIGNAL" dontlog SetEnvIf Request_Method "REPORT" dontlog CustomLog logs/access_log combined env=!dontlog
The order of these statements is relevant. In your 3.2 Site ReplicationGlobule's main feature is to replicate Web sites. This section will explain you how to configure Globule so that documents from a given web site are replicated (i.e., copied) across multiple servers and maintained consistent (i.e., updated when the origin version is updated). Each Web site must have one origin server, which holds the authoritative version of the documents. It can be replicated across any number of backup servers and replica servers. To establish replication from an origin server to a replica server, or from an origin server to a backup server, both servers need to be configured appropriately:
Whenever a browsing user on the Internet surfs to the web-site being replicated, one of the replica servers or the origin server is selected to handle the request. If a replica server is selected, the browser is redirected to the replica server. The most accessible form of redirection is HTTP redirection. HTTP redirection is easier to understand and set up, but has some disadvantages over DNS based redirection. After you understand HTTP redirection you can turn to section 3.3 for DNS based redirection. Replicating a site with HTTP redirectionWe will go through the configuration of a web-site replicated across one origin and one replica server. Later we will add a backup server which acts as a fall-back when the origin isn't available for replica servers to fetch fresh copies of web pages.
Note that the web-servers run at different port numbers (yours on the default port 80, the server of your friend at port 8080. With HTTP redirection any combination of ports is possible. As an example of a document being replicated consider the photo image file available at http://www.revolutionware.net/photo.jpg. This will be copied and made available at http://wereld.cs.vu.nl:8080/worldpages/photo.jpg To replicate your site www.revolutionware.net you must modify your configuration to something like: Listen 80 ServerName world.cs.vu.nl ... LoadModule globule_module modules/mod_globule.so GlobuleAdminURL http://world.cs.vu.nl/globuleadm/ ... NameVirtualHost * ... <VirtualHost *> ServerName www.revolutionware.net DocumentRoot /var/www/html/pages <Location "/"> GlobuleReplicate on GlobuleReplicaIs http://wereld.cs.vu.nl:8080/worldpages/ coffee </Location> </VirtualHost>
This configuration shows the ServerName, GlobuleAdminURL, etcetera laid out in
a manner described in section 3.1.2. It then resumes with
defining the www.revolutionware.net virtual host section and the documents for
this web-site which will be replicated are to be placed in
The actual replication is performed by two directives
GlobuleReplicate and GlobuleReplicaIs. Both must be
defined inside a Location environment which determines from which
path the documents will be replicated. In this case the path is anything from
GlobuleReplicate onThe GlobuleReplicate declares that the web-site must be replicated and that this server will act in the role of origin for the web-site. Because the GlobuleReplicate directive is placed inside a Location directive, the URL path from which to start to replicate is determined from this Location environment. You can also turn redirection partially off for a web-site. Turning off replication is described in 3.2.2. GlobuleReplicaIs...One or multiple GlobuleReplicaIs then declare the replica server(s) to which to replicate the web-site to. You an your friend need to agree upon an URL path you are exporting (assumed until now to be http://www.revolutionware.net/) and a URL path on which your friend will be importing your web-pages (assumed until now to be http://wereld.cs.vu.nl:8080/worldpages/). You also need to agree upon a shared secret; a password known by both your origin server and your friends replica server and used for inter-server authorization. In the above configuration the phrase ``coffee'' was chosen. Now your server is configured, but your friend needs to update his configuration as well. Listen 8080 ServerName wereld.cs.vu.nl ... DocumentRoot /var/www/html ... LoadModule globule_module modules/mod_globule.so GlobuleAdminURL http://wereld.cs.vu.nl:8080/globuleadm/ ... NameVirtualHost * ... <VirtualHost *> ServerName wereld.cs.vu.nl DocumentRoot /var/www/html <Location "/worldpages/"> GlobuleReplicaFor http://www.revolutionware.net/ coffee </Location> </VirtualHost> This configuration has one Globule-specific directive; namely the GlobuleReplicaFor directive which specifies that your friends server will act within the role of a replica server for your (as specified in the argument of GlobuleReplicaFor) server. The GlobuleReplicaFor also needs to be located inside a Location directive to indicate to globule at which path your web-site should be available. Your friend has a mirror configuration that you have. The ServerName and Location in which your friends GlobuleReplicaFor is form the URL as specified by your GlobuleReplicaIs. Vice versa, the ServerName and Location in which your GlobuleReplicate/GlobuleReplicaIs is placed form the URL as specified in the argument to GlobuleReplicaFor. 3.2.1 Using a backupWhenever a replica copy of a document is not available or no longer valid at a replica server, it will fetch a fresh copy of the page from the origin server. This way replica servers will keep up-to-date. However it can be that the origin server is not available at the time. To this end, backup servers may be defined. The role of these servers it to maintain a complete set of documents for the replicated web-site. They obtain this set of pages from the origin server through the same method as normal replica servers, but just make sure they keep a valid copy at all times. Replica servers can thus fetch a copy of a web-page from the origin server, but if unavailable also from a backup server. 3. Since the operation of a backup server is largely the same as a replica server, the configuration follows the same line, with three exceptions:
We will run through the modifications in the origin server and replica server and how the backup server should be configured. We assume you have another friend with the machine monde.cs.vu.nl which offers to be your backup-server, then in your configuration of the origin site add the GlobuleBackupFor directive: Listen 80 ServerName world.cs.vu.nl ... <VirtualHost *> ServerName www.revolutionware.net DocumentRoot /var/www/html/pages <Location "/"> GlobuleReplicate on GlobuleDefaultReplicationPolicy Invalidate GlobuleReplicaIs http://wereld.cs.vu.nl:8080/worldpages/ coffee GlobuleBackupIs http://monde.cs.vu.nl:8333/worldpages/ tea </Location> </VirtualHost> Clearly, backup servers are almost the same as regular replica servers for the redirector. The main change is that all regular replica servers need to be explicitly told there is a redirector available for this site: Listen 8080 ServerName wereld.cs.vu.nl ... <VirtualHost *> ServerName wereld.cs.vu.nl DocumentRoot /var/www/html <Location "/worldpages/"> GlobuleReplicaFor http://www.revolutionware.net/ coffee GlobuleBackupForIs http://www.revolutionware.net/ http://monde.cs.vu.nl:8333/worldpages/ </Location> </VirtualHost> Note that the usage of the GlobuleBackupForIs is with two arguments, first arguments specifies for which site we are defining a backup (GlobuleBackupForIs), the second argument specifies who the backup server is (GlobuleBackupForIs). No password needs to be defined; the first argument must always be the same as specified in GlobuleReplicaFor. Finally the backup server of your other friend needs to setup his configuration, which is almost the same as setting up a replica, but you should also add a GlobuleDefaultReplicationPolicy and use GlobuleBackupIs. GlobuleBackupIs4. Listen 8080 ServerName wereld.cs.vu.nl ... DocumentRoot /var/www/html ... LoadModule globule_module modules/mod_globule.so GlobuleAdminURL http://wereld.cs.vu.nl:8080/globuleadm/ ... NameVirtualHost * ... <VirtualHost *> ServerName wereld.cs.vu.nl DocumentRoot /var/www/html <Location "/worldpages/"> GlobuleDefaultReplicationPolicy Ttl GlobuleBackupFor http://www.revolutionware.net/ tea </Location> </VirtualHost> 3.2.2 Replicating a partial siteGlobule allows you to easily define parts of your site that should not be replicated. The origin server will simply not redirect clients to replica servers, but only the the original, origin server for the paths selected not to be replicated. <VirtualHost *:8333> ServerName www.revolutionware.net:8333 DocumentRoot ... <Location "/"> GlobuleReplicate on GlobuleReplicaIs ... GlobuleBackupIs ... </Location> <Location "/cgi-bin/"> GlobuleReplicate off </Location> </VirtualHost> This instructs Globule to replicate the web-site with the URL http://www.revolutionware.net:8333/ except the pages that are in the sub-path http://www.revolutionware.net:8333/cgi-bin/.
When using HTTP redirection, another way to replicate only parts of a site is
to insert the GlobuleReplicate, GlobuleReplicaIs
and GlobuleBackupIs directives inside a <Location>
container with a sub-path of <VirtualHost *:8333> ServerName www.revolutionware.net:8333 DocumentRoot ... <Location "/replicate_me/"> GlobuleReplicate on GlobuleReplicaIs ... GlobuleBackupIs ... </Location> </VirtualHost> 3.3 Client Redirection using DNS3.3.1 What is DNS redirection?Until now, all configurations shown in this documentation use a redirection mechanism called HTTP redirection. This means that, when an origin Web server receives a request, it can reply by ordering the browser to re-issue the same request at a different server. This scheme is extremely simple, but it has two major drawbacks. First, as the browser is effectively returned a modified URL, it can decide to store that URL for future reference. As a consequence, removing or replacing a replica may render various cached URLs invalid. Second, each request is still initially posted to the origin server, so the success of the request depends on the availability of the origin. DNS redirection addresses these problems by basing redirection on a web site's name. For example, when a browser queries ``http://www.revolutionware.net/'', it first resolves the server name ``www.revolutionware.net''. In a non-replicated setup, the browser would always receive the IP address of the server to contact. Using DNS redirection, the DNS redirector will check where the client is located and return the IP address of the most suitable server out of the available replica servers for the site. IP addresses are usually not shown to the users, so DNS redirection is invisible to them. DNS redirection imposes a few restrictions:
3.3.2 Required elements to setup DNS redirection in Globule
3.3.3 Setting up DNS entries for redirection
Let's assume that you own the domain
Imagine that you have two machines called ``wereld.cs.vu.nl'' and
``world.cs.vu.nl'', which you want to perform the role of origin server and
replica server respectively. Let's assign them the specific names
$ORIGIN revolutionware.net. origin IN CNAME wereld.cs.vu.nl. replica IN CNAME world.cs.vu.nl. Do not forget the dots at the ends of the lines! Alternatively, if you know the IP addresses of your servers (e.g., 130.37.198.252 and 130.37.193.70), then you may define your zone as follows to provide minor performance and reliability improvements: $ORIGIN revolutionware.net. origin IN A 130.37.198.252 replica IN A 130.37.193.70 Note that A records do not end with a dot.
You must now define the generic name www IN NS origin.revolutionware.net. Be warned that any change in the DNS records may take a few hours before being ready for use. If your DNS-redirected site does not work as expected and you see errors like ``www.revolutionware.net not found'', this probably means that you should be patient and wait for changes to be fully propagated. 3.3.4 Configuring Globule for DNS redirectionYou must now configure the origin and the replica server so that they support DNS redirection. Two modifications are needed compared to a non-replicated setup:
A normal origin server configuration without DNS redirection, based on the machine hostname wereld.cs.vu.nl and the site www.revolutionware.net, would look similar to: ... ServerName wereld.cs.vu.nl ... GlobuleAdminURL http://wereld.cs.vu.nl/globulectl ... NameVirtualHost * <VirtualHost *> ServerName www.revolutionware.net DocumentRoot ... <Location /> GlobuleReplicate on GlobuleReplicaIs ... ... Note that the sections separated by vertical dots (:) appear at different points in the configuration file. This order matters, especially the VirtualHost which needs to be at the end of the configuration file.
First, let's enable DNS redirection at the origin server. This is done using
the GlobuleRedirectionMode directive. At the global level you need to
add or modify the redirection mode into
Having done that, you only need to specify that your site can be reached both
as Here is the resulting configuration file: ... ServerName wereld.cs.vu.nl ... GlobuleAdminURL http://wereld.cs.vu.nl/globulectl GlobuleRedirectionMode BOTH ... NameVirtualHost * <VirtualHost *> ServerName origin.revolutionware.net ServerAlias www.revolutionware.net GlobuleRedirectionMode DNS DocumentRoot ... <Location /> GlobuleReplicate on GlobuleReplicaIs http://replica.revolutionware.net/ sharedpassword ... It is important that the ServerName entry contains the specific server name (origin.revolutionware.net), and that the generic server name (www.revolutionware.net) appears as the first entry of the ServerAlias directive. Specific names should be used in other directives such as GlobuleReplicaIs and GlobuleBackupIs.
You must also update the replica server's configuration file to specify that
the replica of the ServerName world.cs.vu.nl ... GlobuleAdminURL http://world.cs.vu.nl/globulectl/ ... NameVirtualHost * <VirtualHost *> ServerName replica.revolutionware.net ServerAlias www.revolutionware.net DocumentRoot ... <Location /> GlobuleReplicaFor http://origin.revolutionware.net/ sharedpassword </Location> </VirtualHost>
You can now start the two servers. Do not forget to run them as root, as
regular users normally cannot run DNS redirectors! Your site should now be
available at URL 3.3.5 Testing DNS redirectionWith DNS redirection, the identity of the server which served your requests will not be shown to you. You may then start wondering if redirection actually works, or if all requests will end up being served by a single server.
Most Linux distributions contain the utility ``dig'' which is used to query
DNS servers by hand. If you do not find it, it is usually part of an RPM
package called Start by testing your DNS domain: Type:dig -t NS revolutionware.net The result looks something like: ; <<>> DiG 9.2.4 <<>> -t NS revolutionware.net ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43750 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; QUESTION SECTION: ;revolutionware.net. IN NS ;; ANSWER SECTION: revolutionware.net. 86400 IN NS NAME-OF-YOUR-DNS-SERVER1.com. revolutionware.net. 86400 IN NS NAME-OF-YOUR-DNS-SERVER2.com. ;; Query time: 1 msec ;; SERVER: 130.37.20.3#53(130.37.20.3) ;; WHEN: Thu Nov 10 15:18:18 2005 ;; MSG SIZE rcvd: 66 In the ``answer section'' you should see at least two lines with the names you the DNS servers responsible for your domain. If you used the services of your registrar to hold informations about your domain, then both servers should probably belong to it. Now, test the names that you have created: dig origin.revolutionware.net ; <<>> DiG 9.2.4 <<>> origin.revolutionware.net ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50422 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;origin.revolutionware.net. IN A ;; ANSWER SECTION: origin.revolutionware.net. 430 IN A 130.37.199.101 ;; AUTHORITY SECTION: revolutionware.net. 430 IN NS NAME-OF-YOUR-DNS-SERVER1.com. ;; Query time: 3 msec ;; SERVER: 130.37.20.3#53(130.37.20.3) ;; WHEN: Thu Nov 10 15:31:30 2005 ;; MSG SIZE rcvd: 66
In the ``answer section'' you should see the IP address of your origin
server. Do the same to test the name Now, let's test if the redirector is correctly registered: dig -t NS www.revolutionware.net ; <<>> DiG 9.2.4 <<>> -t NS www.revolutionware.net ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55825 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.revolutionware.net. IN NS ;; AUTHORITY SECTION: www.revolutionware.net. 600 IN NS origin.revolutionware.net. ;; Query time: 0 msec ;; SERVER: 130.37.193.66#53(goupil) ;; WHEN: Thu Nov 10 15:34:50 2005 ;; MSG SIZE rcvd: 62
The authority section should contain a line ending up with dig @origin.revolutionware.net www.revolutionware.net ; <<>> DiG 9.2.4 <<>> @origin.revolutionware.net www.revolutionware.net ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61015 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.revolutionware.net. IN A ;; ANSWER SECTION: www.revolutionware.net. 10 IN A 130.37.199.101 ;; AUTHORITY SECTION: www.revolutionware.net. 0 IN NS origin.revolutionware.net. ;; Query time: 1 msec ;; SERVER: 130.37.198.252#53(origin.revolutionware.net) ;; WHEN: Thu Nov 10 15:38:04 2005 ;; MSG SIZE rcvd: 78 In the ``answer section'' you should see the IP address of one of your servers. Issue the same command several times, you should receive a different IP address each time. 3.3.6 Advanced usageUsing a backup server
A backup server adds virtually no additional complexity to the setup. Like
using ServerName origin.revolutionware.net ServerAlias www.revolutionware.net <Location /> GlobuleReplicate on GlobuleReplicaIs http://replica.revolutionware.net/ sharedpassword GlobuleBackupIs http://backup.revolutionware.net/ wachtwoord ...
The backup server will be the same as any other replica server, but instead of
using GlobuleReplicaFor it will use the directive GlobuleBackupFor and use
backup.revolutionware.net as ServerName and www.revolutionware.net as
ServerAlias. Likewise the replica servers should use the name
ServerName replica.revolutionware.net ServerAlias www.revolutionware.net <Location /> GlobuleReplicaFor http://origin.revolutionware.net/ sharedpassword GlobuleBackupForIs http://origin.revolutionware.net/ http://backup.revolutionware.net/ ... Not running DNS redirection on port 53 for testing purposes
Globule will bind itself to port 53 for answering DNS queries. This port
number is the only port normally used by browsers to resolve the hostnames in
URLs. However if you want to just test DNS redirection you can resolve
hostnames using the dig -p 5353 @wereld.cs.vu.nl www.revolutionware.net Would instruct dig to ask the name server running on the machine wereld.cs.vu.nl at port 5353 to resolve the name www.revolutionware.net. Globule can be instructed to resolve DNS queries on another port as port 53 using the GlobuleDNSRedirectionAddress directive: GlobuleDNSRedirectionAddress :5353 The GlobuleDNSRedirectionAddress directive needs to be specified before any GlobuleRedirectionMode directive. 3.4 System MonitoringGlobule is more complex than a regular Apache server. As it is inherently distributed, information about it is spread over multiple machines which bare complex relationships. One of the goals of Globule is performance and reliability increase, but evaluation is less straight forward because of the distributed system. In case of unexpected behaviour the cause of this is harder to trace. Globule has a monitoring framework which allows to gain more insight behaviour of a Globule replicates web-site. Typically an administrator wants to monitor a running service, which we define as the ability to:
To address these needs, Globule has an interface for these forms of monitoring controls:
Apache itself provides two logging files which provide some means of monitoring. One is the access-log, which contains a listing of all URLs which have been requested from the web-site. The other logging file is the error-log, which contains error messages ranging in severity from critical, through normal warnings and informational messages. The amount of current state that can be monitored is very minimal, only server-info and server-status module provide some information and are rarely used. The access- and error-log contain only a bit of monitoring data, which is also unstructured and limited in information. Therefore Globule also provides monitoring information which is more suited for a distributed setup, is extendible and has more advantages. It is however very useful to have the standard error and access log interface for two reasons:
Globule therefore provides three main access points for monitoring. First, errors, warnings and some other messages are written to the default Apache error log. Second, an equivalence for the access log is produced. The third monitoring access is specific to Globule. To make it as accessible as possible, detailed Globule information is made available through a web-interface. The usage of these three are now viewed individually in the next subsections. 3.4.1 Error logEach Apache server maintains one or more error-log file(s) where information, warnings and error messages are written.
The error log is not Globule specific and therefore also other modules use the
same error log file to write down messages. Its purpose is primary to log
messages which hamper the correct or intended working of the web-server after
the web-server has been started.
A standard error log file is normally defined naming either Similar to what Apache itself does, Globule associates different levels of significance to messages it generates. This allows the administrator to select which messages should be written into the log or processed otherwise. Globule error, warning and informational messages are not marked any differently from any other messages. Next to the LogLevel directive, however, there is another Globule-specific directive that controls how verbose Globule is in reporting events. This because within a running Globule enabled server you want to be able to increase the verbosity for certain types of events when finding faults. The directive GlobuleDebugProfile sets the initial verbosity of Globule. Only one GlobuleDebugProfile directive can be and should be used, which takes global effect over the web-sites. A common use it to set it at a default level using: GlobuleDebugProfile default This will keep any messages of level ``error'' or above passing through to the Apache logging method. Other profiles available at this time are:
For a correctly running server, informational and warning messages generated by Globule may be accessed through the web interface discussed later too, but the error-log is the only means for Apache/Globule to report situations in which the server is failing. It therefore should be inspected by the administrator of a web-site in case of problems. Note that when configuring Apache you may:
3.4.2 Merged access logA standard installation of Apache provides log files of all successful URL accesses to the server as defined by the CustomLog and/or AccessLog directives. The format of the AccessLog filename is referred to as a Common Log Format (CLF) which is a format shared between multiple types of web-servers. With the CustomLog format you are free to specify the format to be used, but most likely you will use an extension to the CLF known as a combined log format. In any case these log file can be global, or you can specify a separate access log for individual VirtualHost specifications. The default access log produced by Apache is however badly suited within a setup of Globule. It namely only logs accesses to this web-server. Accesses to the same web-site but serviced by a replica web-server are logged at that other web-server. This is not the result you would want from an access log, as one is not interested in the accesses to this web-server but to this web-site. Globule solves this by merging logs of all requests to all replica web-servers serving the same web-site. Each web-server collects data on a per-site basis regarding accesses and some other information. These partial logs are periodically shipped back, based on the interval as specified by the GlobuleHeartBeatInterval directive, through the HTTP protocol back to the origin server, which appends this to its own information. Consequently the accumulation of this data is only partially sorted in time.7 This combined access log not only reports on the bare accesses being made, but also some information relevant for a distributed web-site setup, such as which replica server received the request. Because of this, a file format such as the CLF is not usable and Globule uses a different format (documented in appendix B.1). One can however convert merged access logs from Globule's format into standard common log format (see Section 3.4.3).
Apart from the format, also the location where this file is stored is
different. If you replicate a web-site, then Globule creates a directory
named DocumentRoot /home/www/htdocs <Location /> GlobuleReplicate on </Location>Then this report-log is stored as /home/www/htdocs/.htglobule/report.log .
As mentioned in the introduction of this section there are utilities which depend on a CLF or combined log format access-log file to extract information about the usage of the web-site. Naturally you would want to be able to use any existing utilities. Therefore the globule module is accompanied with a program which transforms a report.log file into a valid access-log file in combined or CLF format. Naturally the additional information stored by Globule is lost in this translation but these would not make sense to any such software. 3.4.3 Utility program globuleutil
The globuleutil /home/www/htdocs/.htglobule/report.log > access.log When the utility program is given multiple arguments representing multiple report-log files, they will be merged based on the timestamp in each file. Not only report-log files may be specified as input files, also regular Apache common or combined log file formats may be specified. Since most of the time input files are not completely sorted in time, you need to either sort them beforehand, or indicate to globuleutil that the files are only partially sorted. The globuleutil utility will then allow for entries to be out of place, as long as the time difference between where the entry should have appeared in the log file based on its timestamp and the place where it actually appeared later on in the log file is no longer than n seconds away. The maximum allowed slag n is the lookahead window in time. This time difference is on a per input file basis. If the window given is too small, an error message will be generated. When specifying a large time interval window, the globuleutil program will execute much slower and consume more memory. This trade-off depend on the settings of your web-server, the outage of replica and origin servers and the GlobuleHeartBeatInterval interval. globuleutil usageglobuleutil [ -v ] [ -f combined | common ] [ -w seconds ] [ -p prefix ] file1... -hOutput help information. -vIncreases the verbosity of information such as the input file format detected, resources and interval window used, etcetera. Multiple options -v increase the verbosity level. -fformat or --format=formatWhere format it either common or combined, specifies in which Apache log style to output the result. Only the common a.k.a. CLF file format is standardized, but the combined log file is an often used Apache file format. -pprefix or --prefix=prefix
Prepend the path prefix before each URL. The URIs in the report-log
files are relative to the path imported or exported from. Full URLs are not
used as the initial path can be different on the replica servers and origin
server in case of HTTP redirection. Therefore you often want to prepend the
path from which the documents are being exported, equal to the path in the
Location directive in which the For DNS redirection, this would be /, which is the default. -w seconds or --lookahead-window=secondsSpecifies the window by of time by which items in any input file may be unsorted. 3.4.4 Webalizer monitoring and the installer setupIf you have chosen for the installer procedure to install Globule, it will include the program webalizer to provide statistics about your web-site and the globuleutil program is automatically invoked when you access the web-page with the webalizer report through the globule administration URL. More on the administration URLs in the next section.
Your installation should include a script ... The webalizer reports are also kept up-to-date in this installation through a periodically run script if kept enabled in the crontab. 3.4.5 Globule monitoring web interfaceMonitoring data specific to Globule can be accessed through a web-interface. A globule-enabled server provides a single address for all the web-sites within Globule's control hosted by the server, which is accessible at the URL specified by the GlobuleAdminURL directive. A normal installation will have a default set of pages installed at this location when Globule has been compiled with the --enable-globuleadm arguments. If you installed using Globule using the automatic installer then the administration pages are always installed. They are not installed for RPM-based installations. These pages can be customized at will as they are not embedded within the server, but communicate with Globule to obtain the monitoring information. The uncustomized pages will show a menu to the different subjects at the top of the pages. Since the pages evolve with each release this documentation does not strive to give a detailed walk-through. Rather, this documentation only explains the rough outline. The pages themselves describe their individual functionality. What the administration pages provide is:
For each section defined you can browse through details such as:
3.5 Dynamically generated contentDynamically-generated content allows the pages of a web site to be more functional by returning content specifically of interest to the browsing user, such as the results of a search function for example. Therefore web-sites with dynamic content will and are becoming more predominant. Dynamic content is defined as documents which are not literally stored as files, but generated as the result of a program execution each time the page is being requested by a browser. Despite their names DHTML and flash content are not dynamic content, as the same content is served to every browser. It is just displayed by the browser differently. For a web server, delivering dynamic content is different than static content because after locating the URL-related resource it needs to invoke a program to transform the plain resource to generate the actual content to be passed to the browser. An interpreter takes the URL-related resource and executes it. This can in turn result in accessing additional resources such as files and databases before the result is passed to the browser. Globule also provides solutions for executing these web-applications distributively. Globule enables the replication of dynamic content based on PHP scripts without any structural changes of the content. It works in the following way:
This is a much more advanced method of replication than mirrors or caching proxies, and much easier to convert to than complicated distributed environments. However there are some limitations of the current implementation of dynamic content replication:
To get replication of dynamic content operational you need to:
3.5.1 Adding PHP support to ApacheWith PHP, the content is generated by an interpreter program, which is a separate software which plugs into the Apache server and must therefore be installed and configured too.
If you used the automatic installer, PHP support should be present already and
the If you need to add PHP support or want to check whether PHP is enabled in your configuration, this section provides some guidelines on the way Globule expects PHP to be installed. Since the addition on PHP support is not directly related to Globule we refer to the official documentation for a full PHP installation reference.
Basic installation and configuration of PHP is relative simple, but since PHP
can be installed and configured so diversely, be aware that incompatibility
can arise when diverting from the expected installation. We therefore
strongly suggest to use the all-in-one installation which provides a standard
installation. The automatic installer and Globule Broker System also provide
the right settings in the If you use the installer and answered ``Yes'' to include MySQL support you already have dynamic content support and you can continue with section 3.5.2 on using Globule support in PHP. If you used the installer without MySQL support, then you will be able to use PHP scripts but database drivers will not be compiled. Contact us if you need to overcome this. If you installed Globule from source, read Section 2.3.2 on how to install PHP from source. 3.5.2 Using Globule support in PHPGlobule will take care of the replication of the PHP source files to replica servers. However, the PHP programs do have to be modified and provide some additional information to Globule. The modifications to the original PHP pages for a Globule environment have to do with telling Globule that one PHP page actually requires another PHP page, data file or database entries to be present. Globule can then also make sure these are present on the local server and point the PHP page to the right location for the specific replica server. The modifications to your PHP pages are:
3.5.3 MySQL query caching with GlobuleIn many cases, PHP pages must access a database to produce a result. In such setups, the simplest setup is to let Globule replicate the PHP code, but keep the database centralized. This setup, often called edge-side computing, may however prove quite inefficient if the performance bottleneck lies in the database. One of Globule's most innovative features allows programmers to design their PHP/MySQL applications such that database query results are cached at the replica servers. This system can greatly improve the overall system's performance [3]. Configuring Globule to cache MySQL query results requires:
Note that this setup currently works only for MySQL databases; also, the use of backup servers is not supported so no page can be delivered while the central database is unreachable. Updating PHP pagesTo make use of database query caching, PHP pages must be edited in the following way:
Usage of query templatesFor Globule to handle cached database queries correctly, it is necessary to declare all queries before they can be use by your PHP scripts. The usage of mysql_query is therefore not directly possible. Instead, any query you want to execute first needs to be stored before it can be used. This procedure is similar to the prepared statement interface in the improved PHP MySQL interface, and many other modern database interfaces. Instead of building the string representing the query and executing it, such as in: for($i=0; $i<10; $i++) { $query = "select * from t where t.id > " + $i + " and t.rel = 4"; mysql_query($query) ... We instead will first declare a template of the query: globule_mysql_declare("myquery","select * from t where t.id > ? and t.rel = 4");
These declare statements should be inserted after any call to the relevant
globule_mysql_attach statement. The above statement declares
a named statement ``myquery'', where certain parts may be filled in when the
query is later executed. These yet unspecified, formal arguments are denoted
with a question mark The query can then be executed, where there used to be a call to mysql_query using a call to globule_mysql_execute, which instead of using the full query, just uses the query name: globule_mysql_execute("myquery", array($i)); The first argument represents the query name, and the second argument is an array of all values to be instantiated for the formal argument in the query template, as denoted with question marks. Configuring Globule for Database Query Caching
Now, you also need to update the Suppose that, before updating your PHP scripts you had the following MySQL connection sequence: mysql_connect("localhost","master",""); mysql_select_db("globecbc"); This would make a contact to the database running on the localhost server, using username ``master'' and with an empty password using the database ``globecbc''. To make this database reachable from the replica servers, we need to update the configuration of the origin server, such that a HTTP based interface for queries to the database: <VirtualHost *> ServerName origin.revolutionware.net ... <Location /> GlobuleReplicate on GlobuleReplicaIs http://replica.revolutionware.net/ sharedpassword ... </Location> <Location /db-globecbc> GlobuleDatabase mysql://master@localhost/globecbc dbsharedpassword </Location> ...
The database identified by the URL The password dbsharedpassword does not represent database password, but a password that each replica server must know to be allowed to issue requests to the database through the origin server.
Now, replica servers can access your database via the URL
If your scripts use multiple databases, then you can repeat this with different names. Make sure the same name is not used twice for different databases! Replica servers should define a similar connection, under the same path. However, instead of specifying the URL with the actual MySQL database, the URL of the HTTP interface of the origin server is specified as such: <VirtualHost *> ServerName replica.revolutionware.net ... <Location /> GlobuleReplicaFor http://origin.revolutionware.net/ sharedpassword </Location> <Location /db-globecbc> GlobuleDatabase http://origin.revolutionware.net/db-globecbc dbsharedpassword </Location> ... There is just a single shared password amongst all replica-servers at the current implementation. The /db-globecbc location path can be freely chosen, but must match in the origin definition, replica definition and PHP script. globule@globule.orgFebruary 27, 2006 |