Skip to content

Testing and Evaluating a Simple Set of Classes

I completely forgot to publish my January 2012 GroovyMag article! The topic “Testing and Evaluating a Simple Set of Classes” covers testing using Spock, CodeNarc and Cobertura.

Here goes nothing…

Almost forgot: the source files, etc. are here.

Testing and Evaluating a Simple Set of Classes

Bob Brown

It is an oft-repeated statement that code written using a dynamic language such as Groovy requires more testing than a static language such as Java. It is fortunate, then, that such a wealth of excellent testing tools is available to the Groovy developer.


I am writing this on a Jetfoil (a Boeing-built jet-propelled hydrofoil) flying/sailing across the mouth of the Pearl River Delta to Macau (“the Las Vegas of the East”). I am visiting friends and colleagues at the University of Macau and will also be giving a presentation on “The Gr8 Technologies” to the University’s Computer Science and Engineering faculty.

My mind is ticking over with an academic viewpoint, and has latched onto a nicely scholastic problem: testing and evaluating a simple pair of classes designed to grade a student’s multiple-choice exam paper.

For this article I will take a look at testing using Spock, CodeNarc and Cobertura.

The classes under test

There are two classes that it is necessary to test.

Listing 1 shows the Grader class. This is where the bulk of the grading-related work is undertaken.

package macau.gr8

class Grader {
  def expectedAnswers
  def graderFileReader

  def grade(String s) {
    def candidateAnswers = graderFileReader.readGradesListFromFile(s)

  def grade(List candidateAnswers) {
    if (expectedAnswers?.size() != candidateAnswers?.size())
    else {
      def count = 0
      expectedAnswers.eachWithIndex {o, index ->
        if (o == candidateAnswers[index]) count ++

      count / expectedAnswers.size()

Listing 1: The Grader class under test

As can be seen, the Grader class possesses two major pieces of functionality, represented as a pair of overloaded methods.

When presented with a list of answers, it compares that list to the expected list (which should have been set at class construction time) and returns a percentage grade in the range 0.0..1.0.

For interest’s sake, there is a more idiomatic way of doing the list comparison:

def grade(List candidateAnswers) {
    expectedAnswers?.size() != candidateAnswers?.size() ?
        -1 : [expectedAnswers, candidateAnswers].transpose().
            findAll { it[0] == it[1] }.size() / expectedAnswers?.size()

I find my way clearer (even if less functionally ‘trendy’), so I am sticking with it.

When presented with a file containing a single line representing a simple CSV-formatted list of answers, the Grader class reads the file, the contents of which are bound to a list and the answer contained within it is graded as previously described.

Listing 2 illustrates what is probably the simplest use of Grader in a script.

import macau.gr8.*

def grader = new Grader(expectedAnswers: ['a','b','c'],
  graderFileReader: new GraderFileReader())
assert grader.grade(['a','b','c']) == 1.0D
assert grader.grade('rsrc/100pct.txt') == 1.0D

Listing 2: Simple Grader usage

Listing 3 shows the GraderFileReader class. This is a small utility class used by Grader to handle File I/O and also perform the binding of CSV data to Groovy list form.

package macau.gr8;

class GraderFileReader {
  def readGradesListFromFile(name) {
      def f = new File(name)
      if (!f.exists())
        throw new Exception("File $name does not exist.")
      def txt = f.text
      txt?.split(',') as List

Listing 3: GraderFileReader utility class

Although having this frankly utilitarian class separated out from the Grader class may seem unnecessary, factoring this functionality into a separate class that is ‘injected’ as a dependency into the Grader class proper nicely simplifies testing, as will be seen.

Simple testing with Spock

To quote the Spock website: “Spock is a testing and specification framework for Java and Groovy applications. What makes it stand out from the crowd is its beautiful and highly expressive specification language.”

Spock is based on JUnit 4 and can be used in any situation where you may currently be applying JUnit. Spock’s novel use of Global AST Transforms to create its “beautiful and highly expressive specification language” makes for a testing tool that provides much greater ease-of-use for the dynamic Groovy developer.

Although perhaps not taking us as far in the direction of Behaviour Driven Development as easyb, Spock still makes it easy to have development driven by story-like specifications. A specification for one feature of the Grader’s functionality might be:

The perfect paper:

    an instance of a paper grader
    a perfect answer is presented
    the calculated grade should be 100%

In Listing 4 it is easy to see how this feature (and similar) is translated into a Spock test.

package macau.gr8;

import spock.lang.Specification
import static spock.util.matcher.HamcrestMatchers.closeTo

public class GraderSpecification0 extends Specification {
  def grader

  def "The perfect paper"() {
      def grader = new Grader(expectedAnswers: ['a','b','c'])
    when: "A perfect answer is presented"
      def result = grader.grade(['a','b','c'])
    then: "The grade should be 100%"
      result == 1.0

  def "The worst paper"() {
      def grader = new Grader(expectedAnswers: ['a','b','c'])
    when: "No answers are given"
      def result = grader.grade([])
    then: "An error should be indicated"
      result == -1.0

  def "A poor paper"() {
      def grader = new Grader(expectedAnswers: ['a','b','c'])
    when: "The fairly poor paper is presented"
      def result = grader.grade(['a','c','b'])
    then: "The grade should be 33%"
      result closeTo(0.33D, 0.01D)

Listing 4: Simple Spock Specification

One big advantage of Spock should now be apparent: the readability of the testing code. It is pretty clear what is going on and it is not too hard to imagine sitting down and discussing this with an end-user.

Since Spock has JUnit “under the covers” it can be used easily with tools such as ant, maven or Gradle. Support in IntelliJ is also good (I haven’t tried eclipse/STS, apologies to all those who have worked hard on those products).

Another powerful and useful feature of Spock is the “deep integration of Hamcrest matchers.”

Hamcrest is “…a library of matcher objects (also known as constraints or predicates) allowing ‘match’ rules to be defined declaratively.” Hamcrest use is optional but can occasionally prove quite effective in helping us create clear tests.

Consider the third feature in Listing 4. Here we have a result of 1 correct answer out of 3. A naïve check for ‘result == 0.33D’ would of course fail, but the use of Hamcrest’s closeTo method makes for a much more readable and concise test than would otherwise be possible.

As Figure 1 shows, Spock can produce a nice, HTML-format report that allows one to drill-down into each individual class’ results.

Figure 1: Spock’s reports

At the moment, the text labels associated with the various blocks in a feature don’t make it into the generated reports. The API reportedly allows for this but the developers are looking for someone to incorporate this ability into the reports in an appropriate fashion.

Parameterised testing with Spock

Take a look back at Listing 4. The testing code is quite OK, but is none too DRY (DRY = Don’t Repeat Yourself; a directive that all of us should follow as rigorously as is practical). Writing a multitude of essentially similar tests quickly gets old! Spock’s parameterised features help to preserve both our sanity and the quality of the actual testing code.

Listing 5 shows how Spock very neatly deals with the essentially repetitive testing needed to ensure the Grader class’s correctness.

public class GraderSpecificationListing5 extends Specification {
  @AutoCleanup(quiet = true)
  def grader = new Grader(expectedAnswers: ['a', 'b', 'c'])

  @Unroll("#iterationCount: grade for #paper is #res")
  def "Grader with papers given inline"() {
    expect: "Grade an individual paper"
      that grader.grade(paper), closeTo(res, 0.01D)

    where: "With the following papers"
      paper                | res
      ['a', 'b', 'c']      | 1.0D
      ['a', 'b', 'd']      | 0.66D
      ['a', 'c', 'b']      | 0.33D
      ['x', 'y', 'z']      | 0.0D
      ['c', 'a', 'b']      | 0.0D
      ['a', 'b']           | -1.0D
      ['a', 'b', 'c', 'd'] | -1.0D
      []                   | -1.0D
      null                 | -1.0D

Listing 5: Parameterised Spock test

Implicitly specified through the expect:…where: pair of blocks is a loop through the provided data, with each iteration assigning to the variables ‘paper’ and ‘res’ as appropriate prior to them being used and tested inside the expect: block.

Spock normally reports the success or failure of an entire feature. With the @Unroll attribute, one has a chance to customise this behaviour so that a result is given for each iteration of the parameterised test. The IntelliJ runner’s display is shown in Figure 2. Knowing the outcome of each parameterised test gives one a nice ‘fuzzy’ feeling and can be very helpful for long-running features.

Figure 2: Unrolled parameterised test output

Note the orange ‘headlines’ in Figure 2. This is a slight imperfection in IntelliJ’s presentation that is being tracked in the Jetbrains issue tracker via issue IDEA-75860, among others.

Also of note in Listing 5 is the use of the @AutoCleanup annotation. This is another useful element of Spock that helps keep the testing script clean and DRY. The example in Listing 5 uses the ‘quiet’ parameter, which ‘squelches’ any exception that might occur during teardown of a Grader instance. Since Spock automatically attempts to call a ‘close’ method on any associated resource (and since Grader has no such method), such an exception is guaranteed to happen and would simply represent unwanted ‘noise’ that might make the testing outcomes unclear.

Spock and Excel

It is sometimes best to have test parameters held externally to a feature; this may be especially true if one is getting testing fixture data directly from a user.

Spock is capable of driving a parameterised feature from any iterable instance, so with a little bit of help from the Apache Poi project it is possible to use an Excel spreadsheet to provide the data to a data-driven parameterised feature.

Listing 6 shows the Spock test containing a parameterised feature driven by the data contained in the spreadsheet shown in Figure 3.

Figure 3: Excel spreadsheet specifying fixture data

package macau.gr8;

import spock.lang.Specification
import static spock.util.matcher.HamcrestMatchers.closeTo
import static spock.util.matcher.HamcrestSupport.that
import spock.lang.Unroll
import spock.lang.AutoCleanup

public class GraderExcelSpecification extends Specification {
  @AutoCleanup(quiet = true)
  def grader = new Grader(expectedAnswers: ['a', 'b', 'c'])

  @Unroll("#iterationCount: grade for #paper is #res")
  def "Grader with papers from Excel"() {
    expect: "Grade an individual paper"
      that grader.grade(paper), closeTo(res, 0.01D)

    where: "With the following papers"
      [paper, res] <<
        new ExcelHelper(excelFilename: /rsrc\test.xlsx/).rows()

class ExcelHelper {
  def excelFilename

  def rows() {
    def workBook =
      WorkbookFactory.create(new FileInputStream(excelFilename))
    def sheet = workBook.getSheetAt(0)
    sheet.rowIterator().collect([]) { row ->
      def firstCell = row.getCell(row.firstCellNum)
      def paper
      def res
      if ("null" == firstCell?.stringCellValue)
        (paper, res) = [null, -1.0D]
      else {
        paper = []
        (row.firstCellNum + 1..<row.lastCellNum - 1).each {
          def cell = row.getCell(it)
          if (cell)
            paper << cell.stringCellValue
        res = row.getCell(row.lastCellNum – 1).
    [paper, res]

Listing 6: Spock test parameterised with spreadsheet data

It is worth remembering that because it is a pure Java library, POI does not require a windows installation to do its Good Stuff and so does not represent a problem for those who might be running development or continuous integration systems and not paying the “Microsoft Tax.”

Spock mocks and intentions

We may wish to test the Grader’s file-related behaviour but not have a test data file available. Perhaps the system that should produce the file has not yet been created, or is still not reliable or represents some other impediment to the development task at hand? Coupling between projects is often vexatious and can steal a surprising amount of time from a schedule. To compensate, Spock provides a simple yet powerful facility for defining mock objects.

Spock allows us to do more than just define simple mocked objects; it also allows us to define intentions for the mocked object. Whereas a mock allows us to ascribe known (often simpler than usual) behaviour for an instance, an intention makes it possible to evaluate the protocol by which a class is utilised by other classes.

Listing 7 shows how Spock makes it easy to create a mock and define intentions for that mock instance.

package macau.gr8;

import spock.lang.AutoCleanup
import spock.lang.Specification

class GraderMockedSpecification extends Specification {
  @AutoCleanup(quiet = true)
  def grader = new Grader(expectedAnswers: ['a','b','c'])

  def "Given a mock file"() {
    setup: "Establish the grader with a mocked GFR"
      def graderFileReader = Mock(GraderFileReader)
      grader.graderFileReader = graderFileReader
      1 * graderFileReader.readGradesListFromFile(_) >> ['a','b','c']
      0 * _._

    when: "Read a paper's answers from a given file"
      def res  = grader.grade('100pct.txt')

    then: "Ensure expected behaviour"
      res == 1.0D

Listing 7: Mocks and intentions

The key to the magic resides in the following lines:

def graderFileReader = Mock(GraderFileReader)
1 * graderFileReader.readGradesListFromFile(_) >> ['a','b','c']
0 * _._

These do three things:

1) create a mock instance for the GraderFileReader class. This mock instance will implement the same interface as the real class,

2) establish that the mocked readGradesListFromFile method should always return the defined three-element list,

3) state the intention that the readGradesListFromFile method should only ever be called once and that no other method on the mocked instance should ever be called.

That’s a fair bit of sophistication wrapped into a quite succinct syntax!

Guiding test creation with Cobertura

While Spock provides a nice specification-oriented testing capability, it does not supply every piece of the testing puzzle. A perennial question for those writing tests is: “how do I know when I have written enough tests?”

Cobertura helps answer this question by generated instrumented versions of the code under test, running the Spock tests using this code and then reporting on which lines have been executed.

Figure 4 shows what Cobertura makes of the testing undertaken on the Grader class.

Figure 4: Cobertura report showing drilldown

Clearly, not every execution pathway has been tested for the GraderFileReader class. Note that Cobertura has also highlighted the fact that only one branch of the ‘f.exists()’ if statement has been exercised.

Cobertura requires very little configuration, simply point it at the source files and let it run.

The good thing about Cobertura is that it allows us to associate some sort of metric to our testing efforts. While 100% code coverage is the ideal it is rarely achieved, usually due to time/budget or other constraints. Nevertheless, as the late, great science-fiction writer Robert Heinlein once wrote: “If it can’t be expressed in figures, it is not science; it is opinion. It has long been known that one horse can run faster than another — but which one? Differences are crucial.” This is probably one of those issues (like performance) where a pragmatic “good enough” approach is likely acceptable.

Testing code quality with CodeNarc

So we can now assume that we have a Grader class that tests correctly with 100% coverage. Is it good enough yet? Well…probably not! There are sure to be imperfections, inconsistencies, violations of best practices, anti-patterns, inadvertent oversights and miscellaneous style issues remaining in the code; issues that may or may not affect the correctness of the code but which certainly accumulate and reduce overall quality.

CodeNarc is a rule-driven analysis tool that (according to its website): “…analyzes Groovy code for defects, bad practices, inconsistencies, style issues and more.”

There are currently “…over 175 rules with more added each week.” Rules range from the trivial “UnnecessarySemicolon; Semi-colons as line endings can be removed safely“, through the useful “GStringAsMapKey; A GString should not be used as a map key since its hashcode is not guaranteed to be stable“, to encompassing subtle concurrency-related rules like “SynchronizedOnBoxedPrimitive; The code synchronizes on a boxed primitive constant, such as an Integer. Since Integer objects can be cached and shared, this code could be synchronizing on the same object as other, unrelated code, leading to unresponsiveness and possible deadlock.

CodeNarc uses a ruleset configuration DSL (much like that understood by ConfigSlurper) to determine what rules should be enabled/disabled and what parameters should be applied to the rules.

Listing 8 gives the ruleset used for the Grader project.

ruleset {

  description 'Grader CodeNarc RuleSet'

  ruleset("config/codenarc/StarterRuleSet-AllRulesByCategory.groovy.txt") {
        DuplicateNumberLiteral   ( enabled : false )
        DuplicateStringLiteral   ( enabled : false )
        BracesForClass           ( enabled : false )
        BracesForMethod          ( enabled : false )
        IfStatementBraces        ( enabled : false )
        BracesForIfElse          ( enabled : false )
        BracesForForLoop         ( enabled : false )
        BracesForTryCatchFinally ( enabled : false )
        JavaIoPackageAccess      ( enabled : false )
        ThrowRuntimeException    ( enabled : false )

Listing 8: CodeNarc ruleset for Grader

CodeNarc is another tool that produces a nice HTML-formatted report, shown in Figure 5.

Figure 5: CodeNarc report

Tools like CodeNarc are common for static languages like Java. CodeNarc does the Groovy developer a great service by bringing similar capability to the world. It is worth repeating the statement to be found on CodeNarc’s home page: “We’re inspired by the wonderful PMD and Checkstyle Java static analysis tools, as well as the extensive Groovy inspections performed by IntelliJ IDEA.” CodeNarc’s strengths relate to the facts that it is open source, is a command-line tool that–unlike the IntelliJ Groovy inspections mentioned in the above quote–is easy to incorporate into a continuous integration scheme, and (unlike either PMD or Checkstyle) is tailor-made for the dynamic nature of Groovy, rather than for Java.


For all the testing done here, there are still (at least) two more ‘imperfections’ in this code! Consider what would happen if the grader instance were to be provided with the expectedAnswers parameter given as an empty list, or null (or if no parameter at all is given)…fixing the remaining issues is “left an exercise for the reader”, as my old Maths books always put it.

What does this prove? Well…to paraphrase an old, slightly sexist adage: “a tester’s life is never done.” At least with Spock, life is a bit more productive and maybe even a bit more fun.

Learn more

Bob Brown is the director and owner of Transentia Pty. Ltd.. Based in beautiful Brisbane, Australia, Bob is a specialist in Enterprise Java and has found his niche identifying and applying leading-edge technologies and techniques to customers’ problems.

Tags: , ,

C, Java Enterprise Edition, JEE, J2EE, JBoss, Application Server, Glassfish, JavaServer Pages, JSP, Tag Libraries, Servlets, Enterprise Java Beans, EJB, Java Messaging Service JMS, BEA Weblogic, JBoss, Application Servers, Spring Framework, Groovy, Grails, Griffon, GPars, GAnt, Spock, Gradle, Seam, Open Source, Service Oriented Architectures, SOA, Java 2 Standard Edition, J2SE, Eclipse, Intellij, Oracle Service Bus, OSB