Double Negative

Software, code and things.

PHPUnit, DBUnit and their quirks

I utilize PHPUnit for my backend testing and have noticed a number of things whilst using it. I have outlined these below - hopefully they will help someone.

DBUnit, composite datasets and foreign keys

There is a bug in the PHPUnit source code which means that composite datasets are truncated in the same order that they are created. If there are foreign key constraints between these tables, you will encounter a number of errors.

I have fixed this issue, and created a pull request here.

Properly tearing down database tests

Make sure you correctly tear down your database connections otherwise you may encounter various errors. I overload tearDown() to do this. Make sure that you call the parent method such that any operation you define within in getTearDownOperation() is called appropriately.

    public function tearDown() {
        parent::tearDown();
        $this->dbh = null;
    }

Open Files

If you have a lot of tests you may encounter the "Too many open files" error.
You can fix this by changing the number of files a process can open using ulimit -n 5000

Test annotations

An annotation block fora PHPUnit test is as follows:

    /**
     * @test
     */

It is not simply a comment block - the first line must have an extra asterisk. This is the same as for docblocks in phpDocumentor.

Test Size Annotations

PHPUnit provides the annotations @medium and @large which indicate the 'size' of a test. Unannotated tests are considered small.

You can configure the test runner to timeout if these tests take longer than expected.

<phpunit  
    beStrictAboutTestSize="true"
    timeoutForSmallTests="1"
    timeoutForMediumTests="50"
    timeoutForLargeTests="50">

The curveball in the mix is that as of version 4.2, all database tests are hard coded as large tests within the source code. I personally think that this should be changed such that database tests default to being 'large' but can still have their size overridden.

For now you'll either have to edit the source code yourself or run database testing independently of any other test setup where 'large' tests have a different nature.


If anyone else is aware of any quirks that it would be worth bringing to the attention of others, please let me know.

PhantomJS, Mocha, and Chai for functional testing

I have been playing around with a number of open source projects pertaining to testing different aspects of a web based application. Over the past few days i have been playing with PhantomJS, Mocha, and Chai.

What is PhantomJS?

PhantomJS is a full stack headless web browser based on Webkit. That means it uses the same browser engine as many of the top browsers including Chrome and Safari.

ZombieJS was the other option that I considered. The difference is that Zombie works with JSDOM a javascript implementation of the DOM.

I opted to use PhantomJS because Zombie is not a particularly stable product (in my opinion), and having tested both the 1.4.1 version and the alpha 2.0.0 version I encountered a number of issues. The biggest problem for me was that it did not work very well with my complex shimming of externally loaded Javascript files, nor did it play nicely with ReactJS.

The other obvious consideration is that I believe tests should be run in as real an environment as possible.

PhantomJS is very good - it is easy to install and setup, but the documentation is a little sparse.

What is Mocha?

Mocha is a javascript testing suite that can be used with node OR in the browser. I want to use it in the browser, my browser being a PhantomJS browser instance.

Mocha allows you to hook in at various points to make sure for example that the necessary setup is complete before running your tests. It also has a really nice way of dealing with code which executes asynchronously. It is actively developed and has a good community around it.

before(function(done) {  
    //run asynchronous setup

    //tell mocha when you are done
    done();
});

//test code

What is Chai?

Chai is an assertion library. It essentially provides methods that can be used to assert that what you get is what you expect. Mocha and Chai work extremely well together.

I want to use Chain because it is extremely readable (BDD constructs) and is extremely well documented.

expect(resultCount).to.be.above(0);  

Combining the three

This is where my adventure got a little tougher.

I want to essentially load a webpage (the page under test), execute a number of commands, and then assert that they did what I expected.

It is simple enough* to create a PhantomJS browser instance and load a page but how does one then load both Mocha and Chai and manipulate the page in a testable way?

Whereas when using node you can simply require() dependencies, because we are using phantomJS from the command line we cannot.

There is a phantomJS runner available called mocha-phantomjs however I found it to be somewhat constraining. You can call a file which contains the code you want to test and the libraries you want to use to test.. which are then run. I can see this being useful for unit testing, but I want to test an already bult page without needing to adapt it for testing. It essentially takes control of the browser (PhantomJS) piece of the puzzle which in may case is unsuitable.

My approach

PhantomJS has a webpage module that has an injectJs() method. I chose to utilize this to inject my test code (and all its requirements) into my page under test. What this means is that I can utilize jQuery (which is already loaded on my page) to manipulate the DOM and access the elements, properties, and values that i want to test.

PhantomJS also provides a method on the client side, callPhantom(). This allows you to callback to the Phantom instance where it triggers the callback that you setup on page.onCallback().

As such my approach is to:

  • Run a PhantomJS browser instance and load the page I want to test
  • Inject my tests
  • Run my tests using Mocha and Chai
  • Pass the formatted response back to PhantomJS
  • Output the results on the command line.

Execution

Given the above, my execution is as follows:

var page = require("webpage").create();  
var args = require('system').args;

//pass in the name of the file that contains your tests
var testFile = args[1];  
//pass in the url you are testing
var pageAddress = args[2];

if (typeof testFile === 'undefined') {  
    console.error("Did not specify a test file");
    phantom.exit();
}

page.open(pageAddress, function(status) {  
    if (status !== 'success') {
        console.error("Failed to open", page.frameUrl);
        phantom.exit();
    }

//Inject mocha and chai                               page.injectJs("../node_modules/mocha/mocha.js");
    page.injectJs("../node_modules/chai/chai.js");

    //inject your test reporter
    page.injectJs("mocha/reporter.js");

    //inject your tests
    page.injectJs("mocha/" + testFile);

    page.evaluate(function() {
        window.mocha.run();
    });
});

page.onCallback = function(data) {  
    data.message && console.log(data.message);
    data.exit && phantom.exit();
};

page.onConsoleMessage = function(msg, lineNum, sourceId) {  
  console.log('CONSOLE: ' + msg + ' (from line #' + lineNum + ' in "' + sourceId + '")');
};

The only bit of the above code that i have yet to explain is reporters. Mocha provides a number of reporters for formatting your test results. Because of the nature of this setup you cannot simply use Mocha's reporters - you have to build your own. This is one benefit of mocha-phantomjs (see above) in that the author has successfully ported over the reporters for you to use.

My basic implementation of a reporter is as follows:

(function() {

    var color = Mocha.reporters.Base.color;

    function log() {

        var args = Array.apply(null, arguments);

        if (window.callPhantom) {
            window.callPhantom({ message: args.join(" ") });
        } else {
            console.log( args.join(" ") );
        }

    }

    var Reporter = function(runner){

        Mocha.reporters.Base.call(this, runner);

        var out = [];
        var stats = { suites: 0, tests: 0, passes: 0, pending: 0, failures: 0 }

        runner.on('start', function() {
            stats.start = new Date;
            out.push([ "Testing",  window.location.href, "\n"]);
        });

        runner.on('suite', function(suite) {
            stats.suites++;
            out.push([suite.title, "\n"]);
        });

        runner.on('test', function(suite) {
            stats.tests++;
        });

        runner.on("pass", function(test) {
            stats.passes++;
            if ('fast' == test.speed) {
                out.push([ color('checkmark', '  ✓ '), test.title, "\n" ]);
            } else {
                out.push([
                    color('checkmark', '  ✓ '),
                    test.title,
                    color(test.speed, test.duration + "ms"),
                    '\n'
                ]);
            }

        });

        runner.on('fail', function(test, err) {
            stats.failures++;
            out.push([ color('fail', '  × '), color('fail', test.title), ":\n    ", err ,"\n"]);
        });

        runner.on("end", function() {

            out.push(['ending']);

            stats.end = new Date;
            stats.duration = new Date - stats.start;

            out.push([stats.tests, "tests ran in", stats.duration, "ms"]);
            out.push([ color('checkmark', stats.passes), "passed and", color('fail', stats.failures), "failed"]);

            while (out.length) {
                log.apply(null, out.shift());
            }

            if (window.callPhantom) {
                window.callPhantom({ exit: true });
            }

        });

    };

    mocha.setup({
        ui: 'bdd',
        ignoreLeaks: true,
        reporter: Reporter
    });

}());

Issues

When I was playing with ZombieJS, my usage of React caused a number of issues. In my mind this was understandable - given how React works with the virtual DOM etc, I kind of figured that a javascript DOM implementation may have problems with it.

There was however an issue using React with PhantomJS. This is outlined in detail here - you just need to polyfill the bind method. This occurs because PhantomJS is using an old version of WebKit. PhantomJS 2.0 will be coming at some point, and this will resolve this issue. This update (when it comes) may change callPhantom() (discussed above) as the documentation outlines that it is an experimental API.

Soo..

Hopefully you find the above helpful. I'd be interested to hear peoples thoughts on this approach as well as any suggestions people may have for improvements.

Facebook's Jest javascript unit testing framework

Over the past few days i have been investigating Facebook's Jest - a javascript unit testing framework. I have found a number of issues with it for my use case.

My main issue is as to what it is actually useful for. The documentation outlines how to test microcosmically small sections of code. Admitedly, that is exactly what a unit test is, however I don't personally believe a lot of people build javascript apps with such minute seperation of concerns.

One of my projects uses React for a number of its client facing interfaces. Jest is used by Facebook to test their React components, so I thought I would try and do the same.

I can absolutely see how one could and would test a component. React is great in that you can make simple components and reuse them in multiple places. The example shown in the Jest documentation outlines how you could test a checkbox component - that really is the extent to which Jest shines in my opinion. I can test that when I change my <SelectButton /> components value my valueChanged callback is called, and an ajax request is sent off.

Integration

A React component can be made up of a number of other react components. In fact, a react interface is a number of react components layed out together and interacting with one another. As such, having tested each component individually, I would like to test the interface as a whole, and the interactions between the components. This is more of an integration test than a unit test, and Jest certainly is not designed for this.

Issues

Jest is extremely new, and is not particularly well documented at the moment. Within a reactive setup, it only works for the most basic of setups. For example, I build my code using gulp and browserify. I use browserify-shim to allow me to require() modules that I am loading from a CDN for example. Jest does not play with this well. You can work around this, but writing tests should not be hard or complex and I expect a testing framework to remove the need for complex boilerplate.

Another problem that I have encountered is the automocking functionality - in principle it is great but it does not work across the board. In certain situations you have to implement your own mocks, and who wants to do that :P It is also not immediately clear what is mocked and when - although the documentation makes it seem clear, I created a manual mock, and within it required some other modules which were not mocked. I then manually created a mock with jest.mock() but it broke the unit under test.

Conclusion

If you are testing extremely small, simple units, Jest is great. It is too immature to stand out to me, but I will certainly keep an eye on it. If I get the the time I would like to read into the internals some more - I feel that if you know exactly what it is doing, and how it is doing it you will get a lot more from it.

At the end of the day I use testing as a way of making me feel confident that my code works as I expect it to and does not regress. I think Jest is a nice compliment to a test suite but it certainly wouldn't be my first port of call.

Consider Sphinx for your search needs

Intro

For most entities building a website, search is not really a consideration. Consideration in the sense that for your search functionality you intend to simply query your database backend for the results you need.. like everyone does.. right?

Certainly for most use cases, the power of modern database backends mean that specific search software is not, and will never be a requirement.

Everyone builds out software intending for it to become popular but unless your software is going to need to query millions of rows using complex queries, considerations like this are not neccesary. That said, if you are unsure of the potential of your product it may be worth considering this now because, as with all things it will be significantly more difficult to integrate into a legacy project down the line.

What is Sphinx?

Sphinx is an open source full text search server. It is extremely powerful, easy to setup, and has a well documented, well architected PHP API for you to use.

It is used by Craigslist as well as many many other entities, small and large.

Use case

There are many use cases for software such as Sphinx. The most apparent use case in my opinion is one that I have utilized Sphinx to resolve - querying large, complex data sets.

If for example you have a denormalized database architecture (for good reasons) and you need to produce search functionality that queries many tables for millions of rows, Sphinx may well be a suitable answer. You have denormalized your database with good reason, and the only respect in which your architecture is lacking is in its ability to be searched. What can you do?

An extremely complex mySQL query for example might take seconds or even tens of seconds. If you want to provide a good user experience, you cannot keep your user waiting for that long.
Instead you can index all of this data on a Sphinx server (running independently of your web/database server(s)) and query it quickly using the provided API.

I implemented this such that 4 million rows could be queried in a negligible amount of time, where negligible = milliseconds.

Issues

The most apparent issue with indexing your data and searching it is that your indexed data quickly becomes outdated. Sphinx fortunately can execute delta indexes which only index changed data. You can for example run a full initial index and then run delta indexes every 15 minutes. You can alter your usage based on your requirements - if you have regularly changing data, you may want to run delta indexes more often.

Conclusion

The above is an abstract look at Sphinx search in relation to a personal implementation of it. To get down to the nitty gritty I suggest you take a look at the latest documentation. I highly recommend the product. It is well documented, and supported, and is actively developed.