Archive for the ‘webstats’ Category

Don’t Tell Me You’re Still Using Webalizer?

Wednesday, April 4th, 2007

One of the most powerful tools in your SEO arsenal is your web server log files and the software or reporting tool you use to convert that raw data into useful, intelligible information. For most, this will be some type of web log analysis tool such as Webalizer.

Unfortunately, Webalizer is the default tool provided by most hosting providers (Any CPanel users out there? This means you!), not because it is particularly useful to their customers, but because it has several advantages for them over much better and more current tools. Namely, it’s free, fast and light on your host’s system resources. Meaning, they can include it with your hosting package and it scales nicely as they add more customers, all with no expensive licensing costs.

What you probably aren’t aware of is there hasn’t been a new version and haven’t been any updates to Webalizer since 2002. That’s right: 5 years. On the Internet, that’s a generation. Think about what has changed in the last 5 years; blogs, podcasting, vodcasting, social networking, blogging, RSS/ATOM feeds - too much to list here. Have you had a good close look at your Webalizer stats lately? Have you ever asked yourself, “wouldn’t it be nice to know X” or “why can’t I see Y” in your reports? Or are you just content to settle for hits, pages, visits and keywords? Well if you are, you are doing yourself a disservice and are missing value data about your visitors, their behaviour, and how you could be using that missing data to increase sales, conversions, traffic and retention.

So what then are you missing? Pretty much everything. In truth, the only valuable info in Webalizer reports are search keywords and possibly referers. Here are some of the features and reporting you are missing:

  • Click trails / path analysis
  • Conversions / action tracking
  • Daily view / date range / zoom / drill down
  • Screen resolution / widescreen vs std
  • Newsletter / Email marketing
  • Affiliates / Partners
  • Robots / spiders
  • Compression / response time / performance
  • Visit duration / initial vs repeat
  • HTTP status codes / error report
  • Split testing

So what are these and why should you have them?

  1. Click trails - To determine the path visitors take through your site from start to finish - where they enter, exist, abandonment, etc. If you know the top paths through your site, you can make improvements to those pages, add relevant info, remove irrelevant or distractive elements, ads, etc. For some sites this is less important that others but you won’t know until you actually see it.
  2. Daily view - What is happening today? Did your traffic spike or drop over a specific time period. Wouldn’t you like to know why? Did you get Slashdotted or send out a recent press release and want to isolate that date range? Did a recent Google Dance effect your traffic? A strategic partner removed a link? Endless reasons for more granular ability to drill down or zoom to more specific time periods.
  3. Conversions - Essential for anyone selling anything online or needing the ability to track ‘actions’, such as newsletter signups, software downloads, online orders, etc. Perhaps you already have tracking through third-party vendors or affiliates? Great - use your own conversion stats to check against theirs (ensure you aren’t getting ripped off). Can tie into Click trails also.
  4. Screen resolutions - Good god, why the hell are developers still creating 800 pixel wide web sites. My stats show that less than 5% of my visitors are running resolutions BELOW 1024×768 and that’s three years running. LCDs have dropped to near CRT prices and widescreen LCDs now dominate the market. You don’t need to spend $4500 on an IDC report to tell you this. Just look at your visitors screen resolutions and adjust your site width and layout according. Imagine what you could do with some extra screen real estate. I’d even venture your visit duration would increase. On a related note, also take into consideration the ageing population of first-world countries and how that changes your visitor demographic (read: font size). Don’t rely on the visitor’s browser to do this. Build it right into your site. It’s so easy to do with CSS there’s no reason not to.
  5. Newsletters and Email marketing - a powerful way to increase site or product interest, repeat visits and sales. A good analysis tool would report on who’s signing up, who’s unsubscribing, where, when, etc. And be able to include Click trails originating from Email newsletters, ad clicks and conversions.
  6. Affiliates and Partners - isolate from general referer pool to find out who of your key partners or affiliates is driving the most traffic to your site, where and those visitor conversion rates. Cut dead weight and forge tighter alliances with the highest performers.
  7. Robots and spiders - What percentage of traffic to your site is actually human generated? If still using Webalizer or other out-dated stats package you’re getting inflated and inaccurate data. Get accurate data on human vs machine traffic. Identify and block bad robots, site rippers, etc. You do have a robots.txt file don’t you?
  8. Compression and response time - Getting a bit technical now but how is your site’s performance? How fast do your pages load? If your pages aren’t loading quickly, people will leave and go elsewhere. How much bandwidth is your site using (important to know if your web host charges for bandwidth overages). Would you benefit from using page compression (Yes) or an accelerator (very likely)?
  9. Visit duration and initial vs repeat - How many people are abandoning after their first visit? Why? What percentage of your traffic is repeat? Knowing these would enable you to make improvements to the message you are sending your visitors, improve keyword placement and prominence, navigational elements and so forth.
  10. HTTP status codes and error report - Okay forget HTTP status codes for now - leave the interpretation of those to your techie. But 404 errors relate to broken links and missing pages/images on your site and pages can have a negative impact. You could be losing massive amounts of traffic because a link or URL on your site has changed. Identify and correct these errors by repairing or replacing missing content. Learn to use web server URL rewriting modules to redirect traffic to updates URLs. On Belchfire Themes, I setup a custom Apache ErrorDocument directive to redirect broken links to my search page to trap this traffic I’d otherwise be missing.
  11. Split testing - Measure and compare the marketing performance of different campaigns and promotions.

So what are the alternatives then to Webalizer? Well, if you are a web host reading this, at minimum consider one of the two known forks of Webalizer that offer some of these additional features and are actively updated and developed:

Stone Steps Webalizer - recommended
Webalizer Xtended

See each site for features list. There is also a GUI client for easy Webalizer configuration available at http://www.tobias-schwarz.net/programmierung. Advantage to using these is old data is retained and minimal configuration changes.

Alternatives to Webalizer

I’m not going to post a massive list as there is no shortage of free and commercial weblog analysis tools. Just search Google. And a big feature comparisons table, well, have a look at the demos. However here are some I have used and recommend:

AWStats - Fast, free, multi-site weblog analyzer. A step up from Webalizer, albeit a minor one. Knows about compression and list of bots, spiders, worms, browsers, search engines is constantly updated. Javascript code to record screen resolutions, colour depth and browser plugins. I use it but rely on others for additional info.

Tracewatch - Free database-driven but odd license and unknown development/updates. Other than basic reporting, key feature is path analysis (as the name suggests). Fast. Consider Tracewatch as a second system to provide click trails and granular visitor data if your current solution does not.

Visitors - Free and fast weblog analyzer. One of my favourites. Reports are reminiscent of Analog (equally as useless as Webalizer in my opinion). Pretty standard reports but has textual click trails reporting and error reports, plus interesting bidimensional traffic map. Reason I include it here though is that when those click trails are combined with Graphviz, you get some of the most cognitive visual representations of visitor web trails even my Granny would understand them. Some screenies:

traffic map:

hwmap.jpg

graphvis click trails:
graph2.jpggraph.jpg

Mint- Commercial, database-driven. Lots of web 2.0/Ajax features and nice presentation and layout. Not feature-rich out of the box, but highly extensible via official and third-party ‘Pepper’ (plugins) and widgets. I use Mint for my blogs. If you blog and are looking for a decent stats package, this seems to be the best available.

phpMyVisites - Free, database-driven multi-site and has really pretty graphs. Too slow for use on high traffic sites and lacks the advanced features above, but is actively developed, supports dozens of languages and includes export to PDF function. I use it for the low-traffic sites I host. My customers like the optional daily email reports it can send out.

CNStats - Commercial, database driven. Some terrific programming talent coming out of Russia lately and this is a prime example. Includes almost all of the features above and reasonably priced. Poor performance on sites with very high traffic (had to abandon it for my site after second month of use due to massive database it creates in order to be able to provide all the advanced features). So long as you’re under say 5000 visits / 20000 page views a day and set it to keep the minimum 30 days of data (or have a multi-cpu dedicated server with 2-4GB ram), there isn’t much out there that can match it.

Logaholic - Commercial, database driven. Another great alternative with many advanced reports including conversion tracking, click tracks, split tests and performance. Also quite reasonably priced. I bought this about a year and a half ago but also had to abandon it as frontend stopped responding after about 40 days of data. At the time I was getting >20 000 visitors with ~150 000 page views resulting in 500MB log files (containing 2 million lines), daily. A lower-traffic site would probably do just fine with either CNStats or Logaholic.

Then of course there are the Enterprise solutions. The granddaddy of them all being:

WebTrends - Commercial. If mainstream is your thing and you need the ability to scale within an existing solution, WebTrends is an excellent choice. Offers low-cost single-site web-based subscription service all the way up to the enterprise. Start small, grows with you. Too many features to list. Licensing and solutions based on industry, number of sites and traffic.

NedStat - Commercial. Tried this when they had a free version for personal use, just to test it out. Seems to only offer subscription-based services these days. Similar to WebTrends in many respects. When I used the installed version I noted that it was the fastest database-driven system I had ever used.

Ominiture - Commercial. Products for mid-market and enterprise. I’ll be frank. This seems more like an enterprise analytics solution that wouldn’t be of much use for me. Goes beyond log analysis. If you are in this market segment, check out the SiteCatalyst product release video tours. I use NetGenesis at work, not for my personal site, and given the choice, I’d use this instead.

spss.com - Speaking of NetGenesis… ;) Commercial solutions again for mid-market and enterprise.

WebSideStory - Commercial. Never used their products but have heard good things from many who have over the years. They specialize in online marketing so while you’ll see analytics, the real ’story’ with these folks is marketing, search, (keyword) bid and content management. Solutions for small business to enterprise.

Urchin - Commercial.  Used to cost much less but has moved into the Enterprise space so forget it unless you have an extra $895.  What Google Analytics is based on.

And Finally:

Google Analytics - Hosted. Free for most.  I use Google Analytics, but if not using it as a reporting tool for AdWords tracking and not focused on marketing and conversions, doesn’t offer much the others don’t. However as a marketing and conversions tracking tool and tied into an AdWords account, it is a highly effective tool that few can touch.

There are many similar hosted analytics and web stats collection services (again, Google can help here), both free and commercial. This article doesn’t deal with those. Perhaps I’ll look more closely at these in the future.

So there’s a short list of some of the well known and lesser know weblog analysis and reporting tools I have used and recommend. Over the last six months, I have almost tripled my web traffic and quadrupled my ad revenues by monitoring my web reports and making small, simple changes to my site accordingly. If still using Webalizer, I hope this gives you an idea of the real value that current, relative reporting can offer and how you can use that information to enhance and improve your site, traffic and sales using these alternative solutions.

NB: You’ll note references above to weblog analyser and database-driven. Weblog analyser means the program reads in your web servers raw web logs to generate it’s reports. Reports are updated when the program is configured to read the logs, usually once daily, and are stored as static text or html files. This is fast but the type of data available in raw web logs is limited. Hense the reduced features of such system, or reliance on additional modules or Javascript to collect additional data (ie. Awstats). Database driven means code is inserted into the source of your web pages and data is collected and recorded in a database on each and every page view. Reports and relationships can then be generated ‘on the fly’ from the database data and are much more configurable (ie. Cnstats). This allows these types of systems to provide more granular and additional info in their reports. Reports are typically real-time. Downside is database can grow very, VERY large (read: gigabytes) depending on how long the data is stored. Logaholic is a hybrid, meaning it’s database driven, but still gets it’s data from your raw weblogs. Hosted works similar to database-driven, where you enter code into your pages, but the data is collected by a third-party and they provide the reporting front-end on their site. Usually carries a monthly subscription fee but some offer entry-level services free with advanced requiring paid upgrade.