Follow Us/Subscribe:        

How NOT to Test Anti-Malware Products

Joe Wells
Chief Technology Officer
Lavasoft AB, Gothenburg Sweden

Evolution

Back at the dawn of the scan age, when the first antivirus products crawled out of the primordial ooze, a model arose for the efficient detection of computer viruses.

File infecting viruses spread upon a system by infecting innocent files in a fairly chaotic confusion. So a virus could exist almost anywhere on the drive system. Often they existed all over the drive system.

As numbers increased, and antivirus companies battled it out by claiming they detected more viruses, a model arose for testing antivirus claims. The model correctly focused on testing how viruses really spread and how antivirus products actually scanned. And of course "big numbers" was construed as a good thing. However, the majority of viruses never actually infected users systems, and thus did not constitute a real world threat.

Then, in 1993, a system of cooperative reporting arose that enabled developers and testers to focus more on the actual threat. It was called the WildList and allowed testers to fine tune testing, by moving away from mere numbers and focusing on the reality of the virus threat.

Since the original WildList appeared, the nature of the actual threat to users has changed dramatically. Viruses (that is true viruses that infect files all over the system) are now nearly extinct. Recent WildLists have only a handful of true viruses. Most threats on the WildList today are actually worms. However in today's reality, viruses and worms are comparatively rare. They are a miniscule part of the real and present threats to users.

For example, at the time of this writing, the WildList (August 2007) says that 580 threats have been reported during the past six months. By contrast, at the time of this writing, the malware research lab at Lavasoft received 1500 new threats over the past weekend.

A New Age is Upon Us

Things have changed radically. Newer threats, current threats, have dramatically evolved in ways diametrically divergent to viruses. Thus, our reality has changed dramatically.

Where viruses could exist anywhere (or everywhere) on a computer, the majority of today's threats load specific files, keys, etc to specific locations on the system.

This is our new reality.

Unfortunately, however, many anti-malware product reviewers are still stuck back in the scan age. They seem to think the archaic antivirus model applies to today's reality. It does not apply, and fabricating it to apply yields inaccurate, and even deceptively misleading results.

To substantiate this claim, let us look at a recent example of a recent test based on outmoded scan age assumptions.

Case in Point

In a recent PC World test (20 August 2007), which was performed by AV-Test Org, a portion of the test involved "about 110,000 inactive" threats. In simple terms the reviewer tested products that were designed specifically to detect "active" against "inactive" threats. Their claimed justification was that "an inactive sample is like an application you've downloaded and haven't yet installed." This is not a valid.

Now, this is the point in our discussion where you should challenge me and ask why that isn't a valid claim.

In response, I will ask you the following questions:

  • If a product is designed to actively block a downloaded file, or to prevent it from being run, that is a good design, is it not?
  • If a product is designed to look for threats where they actually exist on an infected system, that is a valid reality check, is it not?
  • If a product is tested in a manner in which it is not intended to work, and does not need to work, does that testing reflect reality?

If, then, testing does not represent reality, it is a fabrication. It is unrealistic. It can even be considered deceptive.

Let's look at a specific example from this PC World test.

NB. Since I have been associated with several antivirus, and anti-malware products (including two in this review), let us look for evidence of my claim that the testing methodology was flawed, by looking at a product with which I am not associated; namely, the well known anti-malware product called Spybot.

Blind Testers, or Uninformed Editors, or Both?

If you have ever run Spybot, then you know that (unlike products that list all the files on the system as they are scanned) it lists the threats as it looks for them. Specifically, as the product scans, the threat names appear in rapid succession in the GUI.

What, then, about the tester who was watching this test?

Did he or she notice that the threats in the database were being listed? Did that person assume (wrongly) that the whole drive was being scanned for each threat? Hopefully, the tester recognized the fact that Spybot was looking for each threat precisely where said threat would actually be located in a real infection. It was performing a reality check.

I think you would agree that a product which performs a reality check by looking for real threats where they are really found, is doing the right thing. Spybot looks for real infections.

Now imagine another product. A hypothetical product. A fabricated anti-malware product. Any mediocre programmer with a huge collection of threats could easily create a product that detected all 110,000 threats in a test, yet offer zero protection from real infections.

By contrasting Spybot and the hypothetical product, we can easily see the misleading and fabricated nature of this archaic test method.

On the one hand, the useless scanner would score high points for locating threats where they do not exist in real infections, and would thus be rewarded, while offering no real protection in the real world.

On the other hand, Spybot would be penalized for offering real protection from real threats as they really exist.

In this scenario, the testing methodology would reflect the exact opposite of reality. Bad product made to appear good. Good product made to look bad.

Reality

In fact, Spybot was penalized for their approach to scanning. Please note this fact.

The test summary makes the following claim:

"Spybot detected less than 2 percent of our inactive adware and spyware threats, indicating that its signature database of threats is insufficient. It also failed to detect all nine inactive root kits." [italics mine]

Since Spybot is obviously intended to look for real threats where they really exist in the real world, then this claim is unsubstantiated, demonstrably irrelevant, and therefore misleading to readers of the review.

The subheading for the Spybot evaluation states that the product "can't recognize and clean up many of today's threats." But does the method of testing actually substantiate that claim?

What do you think?

Was Spybot penalized because of a flawed test methodology? Should the product review have been designed to reflect the real world. If so, then this product review begs the question of whether or not this review misrepresented the true nature and comparative value of all the products tested. Including, by the way, the Ad-Aware product from Lavasoft, with which I am now associated. In this case, the magazine misinformed readers.

Such misapplication of archaic scan age test methodology to today's modern reality, and to the real world products that deal with that reality, is at best flawed, valueless, unrealistic, and misleading to users. At worst, they are deceptive and damaging.

NB. This is the first of at least two papers on the subject of correct anti-malware product testing.

November 21, 2007
Joe Wells