How Much Testing Is Enough?

Timing is an interesting thing. This week, I was in some company sponsored training. The training was regarding Test Driven Development (TDD), Refactoring and Acceptance Tests using Fitnesse. If your company can afford it, I would highly recommend calling ObjectMentor and requesting this course taught by Robert Martin. The general idea is that programmers should be writing unit tests using TDD and refactoring as they continue development. Some people may be more familiar with the “Red, Green, Refactor” mantra. Basically, your code is only as good as your tests say they are. Also important is how much code your tests actually execute. This is the idea behind the group of tools that measure code coverage. Cobertura is my favorite standalone tool, and Emma is a fantastic Eclipse plugin.

The reason I am talking about this is that I am writing this post in the code version of the editor in WordPress 2.8. Why am I doing this? Because the recent upgrade must have caused problems with the editor. If you Google for “WordPress 2.8 editor problem” you will see a support issue with several people wondering what is wrong. As far as I can remember, WordPress 2.8 was developed fairly quickly. I remember upgrading to 2.7.1 not too long ago. The issue with the editor seems a little too common to be missed during testing, but when you are using FireFox and some plugins, anything is possible. But the real question is how and why did this happen?

I have already written about some of the tools you need for unit testing, but Robert Martin took a much stricter view of testing. The following is my interpretation of his views, and how it relates development and the WordPress problems.

Unit Tests

Unit tests are those tests written by programmers for programmers. These tests should have over 90% code coverage and be completely automated. For these tests, you typically use tools like JUnit and the whole family of xUnit frameworks. If you are not doing unit testing, then you have no real idea if most of your code will work.

Acceptance Tests

Acceptance tests are tests normally written by the QA team or business analysts. Robert Martin feels these should be automated as well using a tool like Fitnesse and provide about 50% code coverage. If you are using Fitnesse, there needs to be a translation layer, called the fixtures, that enables the QA/BA to write their tests. These tests are almost a Domain Specific Language (DSL) that allow the business process to define the “verbs” or actions. The fixtures, or the code for the “verbs”, include the steps for the process. To give you an idea of how generic the fixtures need to be, you could have a hundred acceptance tests but only a dozen fixtures.

Integration Tests

Integration tests are the first tests that get a little “fuzzy”. You can use a combination of Fitnesse and UI tools, like Selenium or Watir, to test how components work together (or integrate). This testing is meant to provide only 20% code coverage. Because you are really only testing integration points in your system, it makes sense that the code coverage is smaller. Most of these tests (if not all of them) should still be automated.

System Tests

System tests are the highest level of automated tests and should provide about 5% code coverage. The idea is that these tests are much more general, and should only prove that the system is installed and configured correctly. If you have met the unit test and acceptance test guidelines, there are likely few areas of code that are not tested in some way. The system tests are purely UI tests, again using tools like Selenium or Watir. I have even heard these tests called “smoke” tests.

Manual Tests

The last level of testing is purely manual. I have heard this described as “bench testing”, “poke testing” and “monkey testing”. The idea is that you are using the system like any random user and seeing if anything breaks. The poke and monkey terms come from the idea that you are poking at random parts of the system or acting like a monkey and just pounding on the keyboard to see what happens.

Final Thoughts

So, how does this relate to the WordPress problems? For a problem like the post editor not working properly, their integration tests failed to find the problem. Once they diagnose the problem, they need to document the issue. By having a suite of automated tests, you only need to add another test for the current defect. In the case of the editor, it could be that there is some plugin configuration for Firefox or even WordPress itself that causes the WYSIWYG editor to misbehave.

Given the public nature of such failures, you should be able to understand the basic benefits of testing. By having an automated suite of tests at various levels, you can easily click a button and see if anything has broken. If you are dealing with an existing system, you get the enjoyment of seeing your code coverage numbers increase as you write tests. A good example of this is the system I am currently working on. Over the past year, we have gone from 3% code coverage to 57% as of this morning. By doing this, we have increased the stability of the system as well as the performance of the system. Automated testing has too many benefits for you to ignore.

11 thoughts on “How Much Testing Is Enough?

  1. The really big problem with testing is time. There is never enough of it.

    Since things frequently change during integration, spending too much effort in low-level testing tends to take resources away from the more critical high-level testing. It’s what the system finally does in a production environment that matters most.

    Of course, if there are lots of little, but essentially contained data-permutation behaviors, it’s just more efficient to automatically test those at the lowest level.

    Like

  2. Paul,

    I agree with the not enough time problem, but I completely disagree with high-level testing being more important. With high-level system testing, it typically takes longer to build the tests and execute them compared to writing unit tests. However, the time problem is sometimes out of the developer’s hands and imposed by management.

    There is also the idea that if your unit tests work for your low level components, then it is just a matter of getting the components working together correctly. The example I used from my work is due to the fact that system testing was all that was really done, but the system was very buggy and very unstable. That has changed significantly.

    Like

  3. I know unit-testing is all the rage these days, but I’ve often found that the more serious bugs leaking out into production are “the left hand doesn’t know what the right hand is doing” bugs, where one part of the code is out-of-sync with the other parts.

    It’s the type of bug that doesn’t show up at the lower levels, but does if you are trying to do something useful with the system.

    I guess I tend to work more with older programmers, so we tend to have less sloppy unit-level issues. IMHO: a younger crowd tends to check-in too quickly.

    Like

  4. Tests help only insofar as they fit the requirements. One concern I have with TDD is how little attention it seems to put toward identifying output requirements.

    Does TDD build on an existing discipline of determining output requirements? I’d be surprised if such a discipline could be quietly implied. Instead, I expect a lot of overt attention must go toward the processes of identifying what the software must do.

    Like

  5. lower and higher level tests, both are a must have. It’s not a either-or situation. To use a cooking analogy. Unit testing is making sure you have the best and freshest ingredients before you start the cooking. Functional and accepting testing is at a higher level, while your dish is starting to take shape.

    Well, I know that in cooking as in programming, I need both levels of testing.

    Like

  6. Tracy

    The basic idea, which I did not really include, is that all of the tests are based on requirements. Admittedly, I completely assumed that part. Unit tests are really only helpful if they test something that someone has identified as a requirement, or it is a programmer requirement. If there is no requirement for what you are writing, why are you writing it?

    Like

  7. Julio

    I definitely agree that you need higher level and lower level tests. The question of test coverage is not whether you are testing anything, but more that the test coverage does not need to be as complete as the lower levels.

    At the “system” level, you do not want to try to get 90% code coverage because that would be tremendously difficult to do. However, at the system level you do want 90% or even 100% requirement coverage.

    Like

Comments are closed.