StatEye User Guide

StatEye Website Statistics

StatEye User Guide

Getting Started
Tuning StatEye (Settings)
Tips & Tricks
How StatEye Works

1. Getting Started

1.1. System Requirements

In short: UNIX, Bash, Perl, C, CGI and crontab.

For more details read the System Requirements section in the introduction.

Any feedback, suggestions and fixes that would enhance portability will be most welcome.

1.2. Installing StatEye

Download the latest release from the download page, then follow the instructions on the Setup & Install page.

1.3. Getting your website's traffic reports

The software is usually set up to send one e-mail after the end of each day and one after the end of each month (read Setting up the crontab on the Setup & Install page).

To read the reports, we suggest that you use a spreadsheet program (such as OpenOffice Calc or Microsoft Excel). For this simply follow the following steps:

Open the E-mail message.
Download the attached file (it is sent as a tab delimited .csv file).
Open the file with your spreadsheet program as a tab delimited CSV file.

You can then use all the spreadsheet features (including building your own graphs and making your own macros).

StatEye has an option to report on a subsite or specific part (even page) of your website. To do this simply add an argument when calling statcron.pl. The argument should be a segment of the path in the URL that you want to match. For example if you have a few pages about Komodo Dragons that can be found in http://www.7is7.com/otto/komodo/, then you could make a subsite report on the section about these lizards as follows:

./statcron.pl day -1 "otto/komodo"

Any visited URL that matches "otto/komodo" is then taken into consideration for the report, while lines that do not match the string will be ignored. You can also do the exact opposite, i.e. exclude all pages that match the given string by preceding it with an exclamation point, so to report on all the pages except those whose URL contains "otto/komodo":

./statcron.pl day -1 \! "otto/komodo"

You can use regular expressions as well, as a matter of fact you can use any regular expression that Perl can handle. To report on two sub-directories for example you could write something like this:

./statcron.pl day -1 "otto/(dir1|dir2)"

Warning!: It is vital to precede the exclamation point by the backslash for this to work correctly, otherwise the exclamation point will be interpreted by the shell and this may cause unintended things to happen.

2.2. Force IP Session Measurement

This option simply forces StatEye to measure sessions by counting IP numbers and ignoring cookie session.

Cookie session measurement manages to distinguish between different users behind a firewall on the one hand and on the other hand detect that a user behind a firewall with multiple IP addresses is one and the same. Also a visitor that visits in the morning then closes his browser and visits again in the evening, will be counted as two separate sessions when using cookie session measurement but would be 1 session with IP measurement (for the monthly reports the same is true for visits on different days).

However although cookie session measurement seems to yield a more correct result there are some problems with it. Not all browsers accept cookies, in which case StatEye reverts to IP sessions for that visitor. Worse is that some Internet Explorer versions pretends to accept a cookie but then seems to forget it for the second pageview but not after that. This makes it look like two different visitors in cookie session management and only 1 in IP session measurement.

This setting, like all such settings in StatEye, is an analysis time setting, so you can always rerun reports over past periods using either cookie sessions or IP sessions.

You can choose to force IP session measurement by (re-)running the setup script in advanced mode and answering 'yes' to the question on forcing the use IP sessions, or by manually editing the statconf.pl file in the directory where StatEye was installed and setting the variable $use_ip_session to 1.

2.3. DNS Resolution

DNS resolution is the process of converting IP numbers into DNS names. By default StatEye will try to resolve DNS names during the analysis phase. However DNS resolution is a lengthy process. It is the most time consuming activity of the analysis phase, because every IP number has to be resolved a gethostbyaddr request is made for each IP number.

DNS names are used by StatEye for showing details of the top-10 visitors and more importantly for determining the top level domains of visitors (an imperfect approach to seeing which countries your visitors are from).

If therefore you do not care too much about these things, you can choose not to have StatEye resolve DNS names by (re-)running the setup script in advanced mode and answering 'no' to the question on resolving DNS names, or by manually editing the statconf.pl file in the directory where StatEye was installed and setting the variable $resolve_dns to 0.

2.4. Visited URL Corrections

This feature allows StatEye to treat URLs like /otto/estonia/ and /otto/estonia/index.html as identical and similarly also to treat www.7is7.com and 7is7.com as identical.

StatEye does this by removing certain trailing and leading patterns from the URLs of pages visited on your website. These patterns are regular expressions so you can have several patterns removed. By default we have:

Leading pattern to remove: 'www.'

Trailing patterns to remove: '(index.html|index.htm|index.php)'

You can change these patterns by (re-)running the setup script and entering your removal patterns at the relevant questions or by manually editing the statconf.pl file in the directory where StatEye was installed and setting the variables $remove_leading and $remove_trailing to whatever you want.

Note: These URL corrections are not applied to referrers.

2.5. Site Search Analysis

If you have a search function on your site that works with the GET method (as opposed to the POST method), then StatEye is able to extract the search terms that your visitors used and the result pages they consulted, if you included stateye.js in your search result page.

For this to work you need to tell StatEye what the URL of the search result page is, what portion of the query string refers to the search term and what portion of the query string refers to the search results page number.

For example for a search that logs this URL: www.7is7.com/search/result.pl?q=countdown&page=2 set the following:

Search URL: /search/result.pl

Search Term: q

Search Result Page: page

Search Result Start: none

Alternatively a search that logs this URL: www.7is7.com/search/result.pl?q=countdown&start=20 where start indicates the how many'th result this page starts from and supposing that 10 results are shown per page of results, set the following:

Search URL: /search/result.pl

Search Term: q

Search Result Page: none

Search Result Start: start

Search Result Start Factor: 10

Note: All these settings (except the start factor) can be regular expressions.

You can set your settings by (re-)running setup in advanced mode and answering the appropriate questions, or by manually editing the statconf.pl file in the directory where StatEye was installed and setting the following variables appropriately:

$site_search_url

$site_search_pat

$site_search_page_pat

$site_search_start_pat

$site_search_start_fac

2.6. Counting Shopping Cart Sessions

It is possible to count the number shopping cart sessions that your visitors have created. A session exists when your website stops being stateless and instead shows visitor specific content (such as their shopping cart).

This usually works by setting a cookie so that the webserver knows what content to show a particular visitor. As StatEye logs cookies all you need to do is tell StatEye which cookie(s) indicate(s) the existence of a virtual shopping cart and StatEye will count the occurrences for you.

Multiple cookies can be matched by defining the cookie session identifier as a regular expression.

You can set this by (re-)running setup in advanced mode and entering the name of the cookie when setup asks for the cookie session identifier for shopping carts. Or you can manually edit the statconf.pl file in the directory where StatEye was installed and change the value of $session_id.

2.7. Sales Path Tracking

By tracking up to 3 URLs separately under the key figures, or totals, section at the beginning of a StatEye report, we can focus on the sales path. This may be the path of your ordering process. You may want to know for example, how many people started the check-out process, how many completed it and how many made a payment.

StatEye uses 3 regular expressions to monitor this, step1_url, step2_url and trigger_url. You need to find the URLs that correspond to the different steps that you want to monitor. If several URLs correspond to the same step in the ordering process define the steps and trigger as a regular expression, for example:

/cgi-bin/shop/(creditcard|transfer).pl

To set sales path tracking run setup in advanced mode and answer the appropriate questions or edit the following variables in statconf.pl: $step1_url, $step2_url and $trigger_url.

2.8. Excluding your own pageviews

In order not to pollute your statistics with data about your own visits to your website, you can set IP numbers that will be excluded. You can set multiple IP numbers that you may want to exclude by using a regular expression in order to exclude multiple IP addresses. Some examples:

1: ^12\.34\.56\.1$ - correct

2: ^12\.34\.56\.[0-9]+$ - correct

3: 12.34.56.1 - wrong!

4: 12.2.2 - wrong!

Example 1 will only match 12.34.56.1 and example 2 only 12.34.56.0 to 12.34.56.255. Which is correct. However example 3 would also match 112.34.56.1 and ip numbers ending with .10 to .19 and .100 to .199. Example 4 is even worse as it would even match seemingly completely unrelated IP numbers as 200.12.202.78 (an unescaped dot means match anything).

The bottom line is: escape the dots using '\.' and use the ^ and $ place holders.

As with most StatEye settings, this is an analysis time setting, meaning that your pageviews will still be logged, they will just be filtered out before analysis. To exclude your IP numbers run setup in advanced mode and answer the appropriate question or edit statconf.pl and set $ignore_ipnums accordingly.

Note: You can exclude any IP number, so you can also exclude those of people who you think are trying to manipulate or pollute your statistics.

2.9. CGI Extension

Some web servers require that all CGI scripts have the extension .cgi. Therefore StatEye actually links the filename stateye.cgi to stateye.gif. Which makes those two files equivalent.

If your server requires that you use the .cgi extension, then there is no need for putting the cgi programs in a separate cgi-bin directory and StatEye will put all the files in the stateye directory. During the setup you will be asked whether your cgi programs must be in a separate cgi-bin directory or have the .cgi extension.

The setup script will automatically update all that is required whatever the cgi requirement is on your site. In particular the files stateye.js, stateye.incl and related files will be updated.

Note: Usually one has to activate the possibility to run .cgi scripts in a particular directory by adding the following directive to the .htaccess file in that directory or one of its parent directories:

Options +ExecCGI

Additionally we recommend that you turn off viewing of indexes in the stateye directory:

Options -Indexes

2.10 Setuid, a.k.a the s-bit

Most webservers these days will have suExec installed and running. If so there is no need for you to change anything in the default settings. However if you are not running under suExec it is advisable to "set the s-bit" (turn setuid on). This means that the StatEye will run as the user that owns the program (this is what suExec would otherwise do for you).

You can turn the s-bit on by (re-)running setup.pl and answering yes to the following question:

Do you want to turn on setuid - a.k.a. the s-bit? (y/n):

Warning! If your webserver is running suExec it will refuse to run a script with the s-bit set.

Since the logs and archive are to be manipulated by the program owner it is important that the logs belong to the owner and that permissions are set correctly. Using either suExec or setting the s-bit will resolve any permission problem that you may encounter.

One advantage of this setup is that should somebody be able to break into your server through the HTTP server, then they will only manage to gain the rights of the userid under which the http server runs, which on a well setup server should not be much, but as StatEye ran as a different user and by default writes the logs only to be readable by the program owner, such a cracker should not even be able to read the StatEye logs.

Although it is possible to manually change the permissions it is not advisable to change the permissions to anything other than what is described above.

2.11 Software Update Check

By default the software will check for updates every 7 days. But you can change this or turn it off, by running setup.pl in advanced mode and setting your desired period when you are asked to enter the number of days between checks for software updates.

To turn this feature off, enter a period of 0 days.

3. Tips & Tricks

3.1. Counting Downloads

Sometimes you may want to measure the actual downloads instead of just the pageviews to the page prior to the download. This can be done by inserting a go-between page that automatically starts the download for the visitor. Not only does it allow you to count how many times visitors actually click on a link to start a download, but you can also convey a message to the user that the download has started or bring something else to his or her attention.

Here is a sample go-between page:

<html> <head> <meta http-equiv="Refresh" content="1; url=http://www.domain.name/full/path/downloadfilename"> <title>Start Downloading</title> </head> <body> <p> If the download process does not start automatically in a few seconds click <a href="downloadfilename">here</a>. <p> After download has completed go <a href="previouspage.html">back</a>.  <script type="text/javascript" src="/stateye/stateye.js"> </script> <noscript> <img src="/cgi-bin/stateye/stateye.gif?docloc=referer&docref=noscript" width="1" height="1" alt="" style="margin:0;border:0;padding:0;display:block;"> </noscript>  </body> </html>

The same go-between page principle can be applied to logging click throughs.

3.2. Excluding Robots

Robots the short name for web robots, are also often referred to as web crawlers, crawlers in short, web spiders, spiders in short or simply bots.

StatEye is not a tool that is intended to track robot activity (such as that of Googlebot) on your site. StatEye is intended to measure human activity and the success of your site with human beings (your potential customers). Robot visits would pollute your data with fake, meaningless pageviews.

Many robots will simply not bother to check a 1 by 1 gif, but a few do and it is better to exclude them. The best way to exclude them is by using the robots.txt file where it was intended for, adding a line like the one below:

User-agent: * Disallow: /cgi-bin/stateye/ Disallow: /stateye/

Many sites will already exclude /cgi-bin/ which implies the exclusion of all sub-directories. For more details about robot exclusion see: www.robotstxt.org.

Note: Robots do not have to respect the robots.txt directive, but such behaviour is frowned upon by the Internet community.

3.3. Including the StatEye JavaScript

Included in the software is a small script, called statincl that will include the StatEye JavaScript code into HTML files for you.

If you chose to install the optional command line utilities during setup they should be available on the command line. If you did not, re-run setup and install the command line utilities.

Warning! Before using statincl make a full backup of your site. This script changes your pages and may do things you did not want to do. It is also wise to first test it on copies of your files, before placing the updated files onto your real site.

Once you have statincl ready, you can run it on one or more pages at the same time:

statincl page1.html page2.html etc...

This utility will look for the occurrence of the <body> tag and add the JavaScript code for StatEye just after it. If the script can not add the JavaScript code it will return an error message. On success it will display the difference between the updated and original version of the file (unless run in silent mode).

The following options are available when running statincl:

-h : Shows some help on how to use the script.
-u : Undo last update of a given file.
-s : Run the script in silent mode. Ex. statincl -s page1.html. The difference between the updated page and the original file will not be shown.
-m (html|xhtml) : Optionally indicate which markup to use, HTML or XHTML. The script currently defaults to HTML markup.
-p var : Append the prefix var to logged URLs, such as "404". Previous prefixes are preserved, use "none" to clear a previous prefix. This is useful for logging error pages, see: Logging Error Pages.

In some situations you may not want to ever add the StatEye code to a certain page. To prevent statincl from adding the code add  to the page.

You can use the Unix utility find with the -exec option to quickly update an entire site. But take note of the warning mentioned above. To update all the .html pages you could go to your document root and enter the following instruction:

find . -name '*.html' -exec statincl {} \;

3.4. Logging Error Pages

Prefixing logged URLs

It is possible to add a prefix to the logged url of certain pages, this can be useful if you want to distinguish views of error pages, such as 404s, from the rest.

If a page on your site is not found chances are that your server will be serving a notfound.html or 404.html instead, but it will be logged as a pageview to the non-existent url that was requested. By including the StatEye JavaScript in the error pages with a "404" prefix this problem can be overcome as the non-existent url will be preceded by "404" in the log. The same can and should be done for 401, 403 and 500 error pages. To setup the StatEye code in a 404 error page named notfound.html simply type:

statincl -p 404 notfound.html

Reporting on error pages

By running a subsite report with a pattern that matches these prefixes, it is possible to get a report covering only the error pages on your site. For example:

./statcron.pl mail day -1 "^[45]0"

3.5. Viewing your StatEye log live

The utility stattail is a little tool that runs a tail on your current log file. It passes on switches to tail, so you can use the same switches as for tail. For example, to follow your log live, type:

stattail -f

Note: The log switches after midnight, so you will need to restart your tail at that time.

3.6. Viewing your StatEye reports

You will receive your StatEye web traffic reports as an attachment to your e-mail. The attachement is a tab-delimited datafile that carries the .tsv extension (tab separated values).

Open the tsv file with a spreadsheet and make sure you select tab as the delimiter. With OpenOffice.org you can go via the file ⇒ open menu and when opening the file select Text CSV as File Type (this option is about halfway down the list).

You can also try to assign .tsv files to be opened by your spreadsheet program by default. There have been issues with this and one work around is to rename the file to .csv and select tab as the delimiter when opening it. There is a request for enhancement which requests that simply double clicking on a .tsv file should open it in the OpenOffice spreadsheet, if you are having problems opening .tsv files and would like that enhancement you can vote for this enhancement request.

4. How StatEye works

4.1. Separate logging and analysis

StatEye uses two fundamentally separate steps, the first is to log raw data and the second to analyze that data.

We log the raw data with the least possible amount of processing. At most we determine where to get the data that we want, for example whether to get the visited URL from an environment variable or from the query string.

The second part is the complex task of analyzing the raw data that we logged. It is during this step only that we try to determine such things as the country of origin of a visitor, try to analyze the query strings of referring search engines to find the search terms the visitor used and analyze the user agent string to determine which browser and operating system a visitor used and which versions of these browsers and operating systems.

The major advantage of this approach is that we do not make our visitor wait for us to analyze things (they could not care less about it) and secondly if we have made a mistake in our analysis we can correct it and rerun the analysis, no data is ever lost this way. This can work as far back as we choose to keep the original logs.

4.2. StatEye Components

The logger, stateye.gif, has to be quick and is therefore written in a language which compiles once into a compact executable. In our case we chose to use C.

The analyzer instead has to be written in a language apt for the analysis of logs and manipulation of strings. In our case we chose to use Perl, even though the execution is not as fast as in C. But the analyzer is run only a few times during a day.

4.3. The advantages and disadvantages over server logs

First of all server logs do not log pageviews but hits, although it is possible to extract pageviews from this data. A server log also contains hits from any source, including robots. StatEye's focus is purely on pageviews made by human beings.

Determining user sessions

Server logs generally do not log cookies, so server logs leave us with only IP numbers to determine user sessions. The problem with this is that if several people are behind the same proxy they all seem to have the same IP address. In other instances several proxies may be used by the same user, making it look like several visits from different IP addresses. Setting a cookie therefore allows us to discern those different users.

The method we employ is to set the time in milliseconds of the first visit in the cookie, at every next visit this original time is sent along with the request of the page to our server. We do not use this time for anything, since it is the client computer's time setting, which is dependent on the timezone, the correctness of the client computer's clock, etc, and therefore basically useless as data. But what it does do is distinguish that browser from another browser that initiated a visit to our website in a different millisecond from the same IP number.

Theoretically 2 or more people can start to browse our site in the same millisecond, hence we enhance this cookie with the first 3 parts of the IPv4 IP number. We leave the last one out since it may vary if someone is behind a proxy, even during someone's session.

It then becomes highly unlikely that two people would have the exact same session identifier.

Pageviews from a cache

If someone views a cached version of your page, either from their browser's cache, a proxy cache or from a web cache such as the Google cache, chances are you will not see the visit to this particular page in your server logs. StatEye however deploys anti-caching strategies and will usually log pageviews to cached pages if the visitor is connected to the Internet.

The anti-caching strategies involve anti-caching instructions in the header and the simple strategy of passing along a timestamp which makes each request for stateye.gif look unique. The timestamp works with JavaScript enabled browsers while the anti-caching headers should be respected by all browsers including non-JavaScript enabled browsers.

Indirect Pageviews

Some pageviews are made indirectly, for example through a translation service. Server logs will log a hit from the translation server and not from the end user. However page elements such as pictures are usually retrieved directly by the visitor and hence StatEye will also be able to determine the user's actual IP address and, depending on the user's browser settings, it may even be able to set or retrieve a cookie.

Only if in the above mentioned cases the user moves from the cached or indirectly viewed page to a page on your site could the cookie session measurement method log two distinct sessions, but this depends on the user's browser settings.

4.4. Why set a StatEye cookie with JavaScript?

The alternatives are to have stateye.gif, the StatEye logger, set the cookie or not to set any cookie at all. In the case that we do wish to set a cookie, we do not know for sure if the browser will accept the cookie, but the StatEye logger would log the cookie that it tried to set anyway. Suppose the browser does not allow a cookie to be set, when the user visits again it would appear to be a new visitor and StatEye would attempt to set yet another cookie and count another session.

If JavaScript manages to set the cookie, we log it, if the browser did not allow a cookie to be set, we log the absence of a cookie, hence we know for sure that cookies are not accepted and we can decide to use the IP number to determine the session. (Only some versions of Microsoft Internet Explorer seem to pretend to accept the cookie but then don't pass it on with the next pageview).

Furthermore some browsers may be set so as not to accept cookies from sites other than the original server of the page (I set my browser like that), this problem can also be circumvented by setting the cookie with JavaScript and passing it on as an argument when calling the StatEye logger.

This can be particularly useful if your CGI scripts and HTML pages are hosted on different HTTP servers.