Skip to content

Using Betamax to Mock HTTP-based Interactions

Another Groovymag contribution. This time from GroovyMag March 2012.

I thought the title was ‘clever’, even if no-one else liked it!

As usual, the project files are available


To Kill a Mocking Problem

Using Betamax to Mock HTTP-based Interactions

Bob Brown
bob@transentia.com.au
http://www.transentia.com.au

Extensively testing system components is a cornerstone idea of modern software development but, even with Groovy’s powerful toolset, is sometimes easier said than done. This is especially true when testing a feature in an HTTP-based application which relies on fixture data that is supplied via an interaction with another system. This article shows how Betamax can help.

Introduction

Some time ago, in a project not so far, far away from me I had a problem: I needed to run a test feature that required live fixture data from a “downstream” WebService. I couldn’t use the standard TEST environment because a colleague was reconfiguring it for his own nefarious purposes. I couldn’t work with the SIT environment because another team was using it. I couldn’t “borrow” the UAT environment since it was already booked out for testing an earlier change. Even the TRAIN environment was unavailable due to an influx of new staff. Stymied! Nothing left to do but hack on the documentation, darn it…

Those of you who work in a large multi-project, multi-department environment will surely recognize (and I hope, sympathise with) my all-too-common situation: there are simply never enough environments to go around.

Had I known about Rob Fletcher’s Betamax, I am sure that I could have saved myself a fair bit of aggravation and broken the dependency chain that linked my testing to the availability of another system.

To quote from the Betamax home page: “Betamax is a record/playback proxy for testing JVM applications that access external HTTP resources such as web services and REST APIs. ….The first time a test annotated with @Betamax is run any HTTP traffic is recorded to a tape and subsequent runs will play back the recorded HTTP response from the tape without actually connecting to the external server.”

As I was thinking about Betamax, my mind mixed together two somewhat disparate ingredients: mocking and archiving old movies. Once I had swirled them together, added a pinch of Geb and a dash of Spock to the cocktail, out poured the task at hand: to show how Spock and Geb can work together to automate scraping quotes from a web page dedicated to the old Gregory Peck classic movie “To Kill a Mockingbird”, and then marshaling the results into a JSON structure suitable for further processing. A second task is to illustrate how Betamax can accelerate things while decoupling the test from the actual web page’s availability.

Martin Fowler (the self-proclaimed “loud-mouth on the design of enterprise software”) would probably prefer that I called Betamax a stubbing system, rather than a mocking one but I couldn’t for the life of me come up with a nice punny stub-related ‘hook’ for the article, so for now I shall persist in calling it a mocking system. Follow the link given in the “Learn More” section for Martin’s excellent article on the matter.

This is not a real testing situation, in that I am not interested in actually testing anything. Nevertheless, I chose to use Spock for this task because it provides an excellent testing framework that is worth highlighting and because it integrates trivially easily with Betamax. I chose to use Geb to perform the actual web page scraping for the best of reasons: visibility, ease, compatibility and power.

The scraping task

The Spock/Geb code for this task is pretty simple. I have chosen to scrape the quotes from the “finestquotes.com” site. Figure 1 shows the contents and structure of the page I am scraping, as displayed by IE9’s Developer Tools.

Figure 1: Contents and structure of ‘finestquotes.com’ quotes page

The meta-task I am undertaking: writing an article covering how to bind different technologies together to scrape quotes from a web page, brings to mind the following—rather apposite—quote:

I have gathered a posie of other men’s flowers and nothing but the thread which binds them is my own.
—Michel Eyquem de Montaigne

Something to think about!

You should keep the structure of the page in mind as you read the Spock/Geb code given in Listing 1.

import geb.*
import geb.spock.*

import static org.apache.commons.lang.StringUtils.strip
import spock.lang.*
import groovy.json.*

class MockingBirdSpecification extends GebSpec {
  def "scrape 'To Kill a MockingBird' quotes"() {
    when:
      to FinestQuotesPage
    then:
      at FinestQuotesPage
    and:
      produceJSON()
  }

  void produceJSON() {
    new StringWriter().with { sw ->
      new StreamingJsonBuilder(sw).quotes() {
        generated new Date()
        title heading
        size quotes.size()
        quotes quotes.collect([]) { q ->
          [ quote: strip(q.quote, "   ~"), attribution: q.attribution]
        }
      }
    println JsonOutput.prettyPrint(sw.toString())
    }
  }
}

class FinestQuotesPage extends Page {
  static url =
    "http://www.finestquotes.com/movie_quotes" +
      "/movie/To%20Kill%20a%20Mockingbird/page/0.htm"
  static at = { title.startsWith('Quotes') }
  static content = {
    quotes {
      $("div#container1 > div", id: "containerin").collect {
        module Quote, it
      }
    }
    heading { $("div#middletitles strong").text() }
  }
}

class Quote extends Module {
  static content = {
    quote {
      $("span.indquote_link").text()
    }
    attribution {
      $("div", id: "authortab").text()
    }
  }
}

Listing 1: Screen-scraping a web page with Geb and Spock

Class MockingBirdSpecification shows Geb and Spock working together to access and scrape the quotes page. In this class, Spock supplies a JUnit-based testing framework while the feature “scrape ‘To Kill a MockingBird’ quotes” drives Geb through its activities.

Geb defines a page using a content domain-specific language (DSL). The FinestQuotesPage class shows this in action. The class gives definitions for the page’s URL, how to check if the page has been accessed correctly, how to access (using Geb’s JQuery-like selector language) the distinguished heading field and how to isolate from the HTML document each div element that acts as a container for the actual quote data. There are many such containers repeated throughout the document, one for each quote.

Each container div itself encloses a defined structure that contains the actual data for each individual quote. Geb supplies a module DSL that makes it easy to cater for the sort of structural repetition seen here. In Listing 1, the Quote class defines how to access the real quote and attribution data relative to its container. As the FinestQuotesPage class isolates each container div, it passes it to a Quote module and then gathers all the resultant Quote module instances into a list that is then exposed via its quotes property. This technique represents a slightly unusual way of working with Geb and is described by Betamax’s creator Rob Fletcher on his “Adhockery” blog (see the “Learn More” section for a reference). Clearly, Rob is a multi-talented guy!

The end result of all this is that the page and associated list modules have extracted all the tasty “meat” from the document and exposed it so that it can be easily manipulated. From this point it is simple to produce a JSON representation using Groovy’s StreamingJsonBuilder and JsonOutput classes.

It is important to understand that Geb has done all the hard work of isolating the distinguished parts of the HTML document “up front” and does it one time only. Subsequent access to the various page and module properties is quite efficient.

It is good to know that Geb can cope with invalid HTML. The HTML specification for the ID attribute says: “This attribute assigns a name to an element. This name must be unique in a document.” The HTML document we are working with here gives each container the same id attribute value (“containerin”) and within each container there is a div with an id attribute having the value “authortab.”

It is worth noting that when executed via Gradle, you will—by default—get nice “manager friendly” reports, as Figure 2 shows.

Figure 2: Spock report

Take a quick note of the time taken for this test to execute, we will refer to this later.

Pressing the ‘Betamax’ button

In the introduction I mentioned one of Betamax’s virtues: the ability to decouple the system under test from its downstream fixture data. Betamax has at least one other virtue: speed.

Consider Figure 2: 6.19 seconds to execute. Not too bad but if you are an agilist with a large number of tests forever chasing James Shore’s “ten minute build” ideal, you will want to do better. By avoiding all the messy network interactions implicit in getting hold of external data, Betamax can speed up test execution, often quite substantially.

Enough discussion, let’s see the code. Listing 2 shows the Betamax-augmented MockingBirdSpecification class. Clearly, it is trivial to bring Betamax to bear on a problem.

import geb.*
import geb.spock.*

import static org.apache.commons.lang.StringUtils.strip
import spock.lang.*
import groovy.json.*
import org.junit.Rule
import betamax.Betamax
import betamax.Recorder

class BetamaxMockingBirdSpecification extends GebSpec {
  @Rule Recorder recorder = new Recorder()

  @Betamax(tape="ToKillAMockingBird.tape")
  def "scrape 'To Kill a MockingBird' quotes"() {
    setup:
      browser.driver.setProxy("localhost", recorder.proxyPort)
    when:
      to FinestQuotesPage
    then:
      at FinestQuotesPage
    and:
      produceJSON()
  }

  void produceJSON() {
    […elided, unchanged…]
  }
}

Listing 2: The Betamax-augmented MockingBirdSpecification class

When you look at Listing 2, one thing should be immediately clear: it is trivial to include Betamax into a testing regime.

There are a number of points of interest in Listing 2.

Betamax uses a JUnit @Rule annotation attached to an instance of its Recorder class to allow the Recorder to hook into the lifecycle of the test and to start/shutdown Betamax’s “tape recorder” proxy (which is actually an embedded instance of Jetty). If you haven’t seen it before, an @Rule is effectively a JUnit 4.7+ extension mechanism that provides access to the JUnit lifecycle (@Rules were initially called ‘@Interceptors’—a name which still seems clearer to me).

The @Betamax annotation defines a ‘tape’ to use; Betamax will store the data arising from each request/response interaction into a tape (for the demo project the file src\test\resources\betamax\tapes\ToKillAMockingBird_tape.yaml). Betamax provides a number of other options (unneeded for this article) that can be set via the @Betamax annotation or on the Recorder instance itself. These include the ability to store only requests that match (part of) a given URI or header field values, setting a tape mode to READ_ONLY and setting a pass-through list of hosts that won’t be proxied.

It is worth noting that Betamax can also configure itself from external files. As the documentation says: “If you have a file called BetamaxConfig.groovy or betamax.properties somewhere in your classpath it will be picked up by the Recorder class.”

As an aside, it is interesting to see that Rob Fletcher has chosen to use YAML (via the plain Java SnakeYAML library) for the tape format. YAML is a superset of JSON which provides “human-readable data serialization format…data-oriented, rather than document markup.” YAML is certainly a better choice for this purpose than XML would have been, due to its superior (at least, easier) support for binary data types. What makes YAML a particularly good choice is that it makes it very easy to simply jump into a tape and edit it according to the needs of a test.

The last point of interest concerns configuring Geb to use Betamax’s embedded proxy. I have added an appropriate setup block to the “scrape ‘To Kill a MockingBird’ quotes” feature.

It is easy to see the effect of Betamax on the test. The first time this Betamax-enhanced test is executed it is slightly slower due to the overhead of creating the tape. Subsequent test executions match and replay the recorded interactions from the tape and so are much faster, as Figure 3 shows.

Figure 3: Betamax replay can be faster

Clearly, Betamax has had a big effect on the test’s execution time. Things are much faster (pretty much twice as fast) without the network overhead.

It is worth reemphasizing that with Betamax in the loop, our test has not needed to go back to the source system. The test is now decoupled from issues arising from the system becoming unavailable, mutating under us, throwing up its own set of bugs, etc., etc.

This is A Good Thing.

It gets better. If there were multiple tests within the same test suite, there would be a much bigger improvement: currently a large proportion of time is spent firing up Betamax’s Jetty-based proxy server to service just the one feature. This is very easy to see by making a copy of the Spock test and executing it alongside the original. Figure 4 shows the result.

Figure 4: Avoiding repeated proxy startup

A nearly 8 times speedup. That ten minute build ideal seems back in reach now, doesn’t it!

Wrapping up

You need to look at Betamax and see what it can do to for you.

Setting up fixture data for a unit test can be quite hard work and often drives people to create integration tests or simply leave things for manual acceptance testing. Betamax makes it as easy as possible to perform repeatable unit testing with high-level or complex fixture data and at the same time can break the interdependencies between systems that often become impediments to a nice steady workflow.

The pairing of Geb and Spock is pretty neat as well.

Although it is early days (Betamax is currently at version 1.0, with version 1.1 showing on the horizon), Betamax shows great promise and is definitely technology I’ll be keeping an eye on…I’d advise you to do the same.

Learn more

Bob Brown is the director and owner of Transentia Pty. Ltd.. Based in beautiful Brisbane, Australia, Bob is a specialist in Enterprise Java and has found his niche identifying and applying leading-edge technologies and techniques to customers’ problems.

Tags: , , , , ,

C, Java Enterprise Edition, JEE, J2EE, JBoss, Application Server, Glassfish, JavaServer Pages, JSP, Tag Libraries, Servlets, Enterprise Java Beans, EJB, Java Messaging Service JMS, BEA Weblogic, JBoss, Application Servers, Spring Framework, Groovy, Grails, Griffon, GPars, GAnt, Spock, Gradle, Seam, Open Source, Service Oriented Architectures, SOA, Java 2 Standard Edition, J2SE, Eclipse, Intellij, Oracle Service Bus, OSB