B File structures and protocols
B.1
report.log structure
Each section of a web-site being exported or imported has a .htglobule
directory which contains accounting information for that web-site section. In
this directory resides a.o. the report.log . This log contains
information which should be collected at the origin server to make decisions,
statistics and a merged log of requests (which can be converted into a
access.log).
The report.log format is completely different than apache-style log files,
because this report.log contains much more information than just requests and
has fields which are more suitable for a distributed environment than the
traditional access log formats such as common and combined log formats
To aid future development, the report.log is not a strict format, but instead
is a free-format file with limited rules on how to separate records of
requests and other relevant data and the fields of data inside a record. It
does not describe which fields in which order should be present.
The report.log is a series of unstructured records of events. Each record is
contained on a single line. Lines which start with a hash sign (# )
should be ignored and can be used for comments. Each line contains one or
multiple fields with data. Fields are in principle separated with one or
multiple spaced or tabs.
A field is either a single letter, used in the report.log to identify
different type of events or is a key--value pair. Key and value are seperated
with either a equal sign (= ), a colon (: ) or semi-colon
(; ). The different separators serve different purposes:
- =
Used to separate a key from a value, where the value can only be a number.
These numbers should bare some relation to each other. For instance,
identifiers in principle bear no relation to each other as two persons with
ID 3 and one with ID 5 have no logical personal bonds with each other, not
can you induce that there should also be a person with ID 4.
However a timestamp would be suitable to use with this, as there is a
logical enumeration of time.
- ;
The semi-colon is a general key--value pair seperator, where the value field
should not be interpreted as a number, but as some identifier. Normally,
there is a limited amount of possible values for a certain key in the
report.log. In other words; you should not expect to see generic text, but
only keywords or identifiers as values in a semi-colon field.
- :
The colon field serves the same purpose as the semi-colon separator, but the
colon can only be used as the last key--value pair and the value in
that comes after the colon may contain spaces and/or tabs.
The following event types can be in the report.log:
- R a document has been requested by some browsing user;
- U a document update has been detected;
- I the document has been invalidated;
- A to indicate that the policy of a document has
changed;
- E to indicate that a document has been evicted from
the cache.
The following fields can be expected:
- t= The timestamp when the event occured.
- path: The path component of a URL, starting without the initial
location from which it was exported (or imported).
- old; The previous (replication) policy that has been used.
- new; The new (replication) policy to be used on a document.
- lastmod= A timestamp with the last modification time of the
document.
- docsize= The (new) document size.
- client; The IP number of the peer (e.g. the browsing user doing
the request).
- elapsed= The amount of time needed to do something (serve a
request for instance).
- sndsize= The number of bytes reported to be shipped.
- browser; The User-Agent reported from the browsing user (very
optional information).
- referer The Referer field in the request reported by the
browsing user (very optional information).
Timestamps and durations are in apr_time_t precision, normally
microseconds. Sizes are in bytes.
Normally, but not guaranteed, the following fields are present for each event
type:
R |
t, client, elapsed, sndsize, browser, referer, path |
U |
t, lastmod, docsize, path |
A |
t, old, new, path |
E |
t, path |
I |
t, path |
globule@globule.org
February 27, 2006
|