Double Negative

Software, code and things.

Value. The data, or the packaging?

For one of my web based projects I needed to acquire some data. Data in the modern day and age is often the most valuable asset of a company... it got me thinking.

The Royal Mail here in the UK offer a Postcode Address File (PAF) to any entity who wishes to have it. The cost of this PAF file is many many thousands of pounds depending on exactly what you want and how you intend to use it.

Recently I was building a location based product. Using the PAF database was in no way financially viable. Instead I did a lot of research into alternative location groupings for the United Kingdom and gave consideration as to how I would acquire the data I needed to work with said groupings. I opted to utilize city, town, and village names grouped by their local government area, and I managed to accquire this data for a relatively insignificant sum.

My next requirement was to acquire data on the establishments that my website lists. My first approach was simply to manually input data - this gave me my first appreciation for the value of data. Were I to have continued to manually input this data my cost would have amounted to the value of my time. Utilizing the British minimum wage and extemely conservative time estimates a complete dataset would have cost me five figures.

Being an engineer I quickly begun investigating more intelligent ways of acquiring the required data. I ended up utilizing Facebooks API, and data scraping to begin the process of acquiring the data that I would need.

Facebook

Facebook seem to be pretty open with their data, and their API. Whilst to some extent you can put a cost on data acquisition, Facebook are in a position whereby sharing data is not going to have too much of a negative effect on them. It is actually more likely that Facebook will gain from my usage of a tiny subset of their data given the number of Facebook integrations across the product.

I pulled data from Facebook, but in the interest of producing a quality product I built functionality to allow me to manually process this data. Facebook allowed me to spend less time collecting data and thus reduced my cost.

Scraping

Having over a long period gradually integrated the data I had obtained from Facebook, I had the shocking realisation that the data was suprisingly incomplete. In this particular niche, many companies run a large number of establishments. As such I decided to scrape further data from the websites of these companies to try and fill the blanks.

What I noticed from this process is that in this particular niche, companies do not really value their data.. and that is somewhat understandable. I want their data so that I can essentially market them. On consideration I am suprised they did not make their data more easily accessible.

I found:

  • a few companies whose data was consistently formatted and easy to extract

  • a few companies who had placed the required data across multiple pages. Annoying but manageable.

  • A few companies using clean JSON backends

  • One company that really needs to hire a new web developer

My approach

Given that this was a one time thing, I was not interested in writing perfect, clean, well tested coded. I built a basic scraper using PHPs CURL functions, opened up the various pages and pulled out the data I needed from the source code.

Extracting the data essentially amounted to:

  • analysing the source code and working with PHPs DOMDocument

  • calling json_decode.

Now what

At this present point in time I have collected a sufficient amount of data to make my product viable. It has enough data to make it useful, and the public seem to agree. My hope is that users of the product will now contribute to the continued growth and general improvement of the dataset.

Given that I have spent a large amount of time an effort collecting, and moderating the dataset I have to some extent come full circle. This dataset now adds significant value to my product. I have done a lot of the heavy lifting and so now surely I want to stop others from scraping my data? Nope..

You cannot stop scraping

The long and the short of it is that if you want to provide your data to a good person (a site visitor or a search engine), then you have to provide it to the bad people too. The web is inherently open and free - you can either share your data or take your website down.

I was able to scrape all the data that I wanted using CURL. It was somewhat tedious, but it was not hard. This is probably the simplest tool in a scrapers toolkit.

You can try and obfuscate your source code, hide you data behind complex authorization procedures etc but this will only hurt you.

Google is a scraper.. as is Bing. If you want your website to be search engine friendly then it is also going to be bad person friendly.

Things likes capchas and required logins ruin the user experience.

Headers can be faked, IPs bought and so on.

Even considering the above, there are headless browsers like PhantomJS which are browsers. Phantom Mochachino, a tool which I built for testing demonstrates how powerful headless browsers are #shamelessplug. A scraper can use a headless browser in exactly the same manner that a normal user uses their browser.

You can make things harder, but anyone who is committed enough can, and will get your data.

It is for that reason I opted not to attempt to prevent scraping. Rather I made my API openly accessible to all.

Think about it

Wikipedia is a really large database, yet you don't see many Wikipedia clones ranking highly in Google. Even if they did you would more than likely click on the Wikipedia link as opposed to the http://rubbish-wikipedia-clone.com link.

Likewise with Stack Overflow - in this case there are loads of clones, but again I don't think I have ever used one and I highly doubt that they have any effect of Stack Overflow's visitor numbers.

Packaging

All things considered whilst data is extremely valuable the packaging of it is in many respects more important.

In my case my product is premised around helping companies within the niche improve visitor numbers whilst providing the general public with a useful and informative resource. I want to incentivize mutual cooperation and incetivize users to contribute to the site. I believe that more people will contribute if they know what we are doing with their data - we are making it freely accessible to anyone who wants to use it and help the industry grow.

So to answer the question.. for the particular dataset that I am working with, both have value. At least I hope they do.

Social implementations of three-legged oAuth

I was implementing some functionality utilizing the Twitter API and found the documentation to be extremely lacking/unclear.

Given that the oAuth protocol seems more complicated than it actually is, I thought I'd document some extra explanation to accompany Twitter's sign in implementation docs.

3 Legged oAuth

Twitter's explanation of 3 Legged oAuth is rubbish. Full stop.

Let me briefly try and explain what is is because there seem to be very few resources that explain it well.

It is called three legged oAuth because there are three participants: the User, Website, and Service.

That said the process can also be considered as being made up of three stages as follows:

Leg 1

  • User wants to provide Website with data from Service
  • Website tells Service that it wants said data.

Leg 2

  • Website sends User to Service
  • User authorizes data request.
  • Service sends User back to Website

Leg 3

  • Website requests access token from Service

Why?

Website can access User's data from Service without knowing User's username and password for Service

Overview

Twitter's implementation of those three legs is as follows:

request_token

The request_token step of the oAuth process is essentially telling Twitter who you are and what you want.

You as the consumer pass in your consumer_key and consumer_key_secret - this indicates who you are.

Twitter then knows what permission you want: read only; read and write; or read, write, and direct message access - you have set these in your apps settings.

You can also pass in an oauth_callback header - this tells Twitter where to send your user once authorization is complete. If you don't pass this header Twitter will redirect users to the callback URL set in your settings. I find it worthwhile explicitly setting this on each request such that you can swap out different callback URLs for development.

I encountered an issue with the oauth_callback header which was resulting in me receiving the error message 'Failed to validate oauth signature and token'. If I did not pass in an oauth_callback header I would receive a token without issue.

My problem was with my oauth signature. It required that my oauth_callback was encoded twice.

As is typical, it is significantly easier to find out about an issue when you know what that issue. A post-fix search presented me with this stack overflow answer which extremely succinctly explains why it needs to be encoded twice.

Another important consideration when creating your oAuth signature is to order your parameters alphabetically when you are combining them to build a signature.

oauth/authenticate - oauth/authorize

Your user should be sent to one of these endpoints passing the request_token returned by the previous step. Here they will authorize your application to access/not access their data.

The former endpoint will automatically redirect an already authorized user to the oauth_callback url specified in the previous step. oauth/authorize requires authorization each time.

On completion the user is redirected to your oauth_callback. An oauth_token and oauth_verifier parameter are passed back if the user has authorized your app.

oauth/access_token

Finally, you POST the oauth_verifier to this endpoint and an oauth_token/oauth_token_secret will be returned. If you write these down ;) or store them in a database, you can access the users protected data without the above process until the tokens expire.

Reliance on Libraries

On a previous project I utilized abraham's PHP twitter library. I remember having no issues with it, but it was very much the case that I was unaware of its internals.

This time I figured I would read up on oAuth and make sure i knew exactly what my code was doing. Whilst browsing i came across j7mbo's twitter wrapper which is a very barebones PHP wrapper for the Twitter API. The beauty of it is that:

  • It works
  • It is simple, easy to understand, and only 300 lines long

There is no point uneccesarily reinventing the wheel so I took this wrapper and extended it to my needs.

Because of the simplicity of the code it was relatively painless for me to debug the double encoding issue with oauth_callback that I outlined above.

Comparison with the Facebook API

If you are implementing functionality that utilizes the Twitter API, the chances are that you are implementing something more generically social. I was - I also needed to work with the Facebook API.

There are a few similarities between the APIs:

  • They both utilize oAuth
  • The both have rubbish documentation

That said I found myself enjoying working with the Facebook API to a greater extent.. (is that weird?).

Firstly Facebook provides an official PHP SDK - in my mind I can be relatively confident that it is well tested, and works. It is a significantly more complex wrapper than the Twitter wrapper mentioned above but one can assume that a company like Facebook would religiously test their SDKs.

In the same way that when testing your own products you assume that external APIs work (or you don't use them), I am happy to assume that the SDK works.

What I like about this wrapper is the helper functions. They provide a FacebookRedirectLoginHelper class which means that to get the login URL to which I redirect the user (to get their authorization), I simply need to call the getLoginUrl method.

On top of that, responses are wrapped up in an object orientated interface so I can get my access token by calling $session->getToken(); and can get properties of the graph objects contained within my response using $graphObject->getProperty('email');

That wasn't so oAuthful..?

So there you have it.. a basic overview of oAuth and its utilization by two social powerhouses.

It has now gotten to a point where I have implemented these APIs so many times that it is somewhat second nature.

The beauty of the oAuth protocol is that once you have a grasp on what it is, and how it works it is clear to see why it is so powerful.

If you want to learn more check out the oAuth website.

Now I'm going to ponder implementing oAuth security into one of my publicly available APIs that really has zero use for it ;)

The UIViewController: Actual Lifecycle and Acceptable Heirachy

I am working on an iOS app for a product that I have been building. Throughout the process I have come up against some hurdles and have sought to resolve them using the (fantastic) knowledge base that is Stack Overflow.

Moreso than when writing code for any other platform I have found that Stack Overflow answers pertaining to Objective C/Swift are full of inaccuracies, are misleading, or are down right wrong. As such I have spent a lot more time investigating issues myself and working out exactly why things happen and how things work.

Apple has extremely good documentation of its APIs, and application guidelines. What confuses me somewhat is the fact that they have not taken the time to write in depth expanations of areas that might not be so obvious and areas that are often discussed and debated.

Given that a lot of the internals of Apple's APIs are private and one cannot simply look for an answer I think this is something they should invest some time in.

Recently I have encountered a number of considerations relating to the heirachy and lifecycle of UIViewControllers.

Issue - UITabBarController in the heirachy

Why exactly does your UITabBarController have to be the root controller? If you read the UITabBarController API documentation it clearly states When deploying a tab bar interface, you must install this view as the root of your window. Why is this?

Using XCode 6 and iOS 8 I embedded a UITabBarController as a child at numerous levels of the heirachy without issue. I am aware that in previous versions you could not.. but as things stand, you can. It would thus seem that at the moment the only reason not to do this is because Apple says not to.

Hands on

In the app that I am building I wanted to have tab bar navigation at the base of the application. In each tab various controls would allow you to open other views which I also wanted to contain independent tabbed navigation. This is not allowed (as outlined above).

After digging a bit I found out that actually it is.. The documentation states that It is possible (although uncommon) to present a tab bar controller modally in your app.

As the tab bar controller always acts as the wrapper for the navigation controllers each tab has to have its view controller embedded in a UINavigationController. Given that I want all the tabs to have the same navigation controls.. this is just annoying. Especially given that the docs state that you should embed only instances of the UINavigationController class, and not system view controllers that are subclasses of the UINavigationController class.

It is extremely unclear as to whether you are 'allowed' to use your own custom UINavigationController subclasses. My interpretation is that it is OK. If you are only doing small manipulations and are calling the respective super methods I can not see any reason why this would be an issue.

Issue - Why is viewWillAppear not consistently called?

What exactly is the UIViewController Lifecycle, and why does it vary under certain nesting circumstances?

For example viewWillAppear is not consistently called in a UIViewController nested in a UINavigationController displayed in a modal..

There is an example of a similar issue here. I dont personally reccomend you use this answer. What I do recommend is that if you have complex or 'uncommon' view heirachies you verify that the lifecycle methods you expect to be called are in fact called.

Issue - manipulating views based on resolved constraints.

Another intriguing issue is manipulating views when their sub views have been laid out. Apple does have the viewDidLayoutSubviews method, but again it is unclear exactly when this method is called. The documentation states this method being called does not indicate that the individual layouts of the view'€™s subviews have been adjusted. Each subview is responsible for adjusting its own layout. - this can lead to some interesting considerations which i have outlined below.

Hands on

In my modally presented UITabBarController I have a UIViewController (nested in a UINavigationController) in which i want to lay out some buttons based on the space available to me when my constraints have been resolved. To make things a little more complex, this is within a UIScrollView.

When my viewDidAppear method is called, my constraints have been resolved. Unfortunately however positioning and adding subviews here will at a minimum cause some flickering as they are displayed. This is not acceptable.

viewDidLayoutSubviews is called at undocumented times. I found from testing that the viewDidLayoutSubviews was in fact called twice, and that after the first call the subviews of my UIScrollView were in fact not layed out. Only after the second execution were all my constraints resolved.

I have no interest in doing any complex error prone conditionals such as calculate and add subviews the second time viewDidLayoutSubviews is called. As such I decided the most definitive way of knowing when my scroll views subviews had been layed out was by creating a custom subclass of UIScrollView and overriding its layoutSubviews method.

The actual view controller lifecycle for my setup is listed below. The frame size is also noted.

  • viewWillAppear (0.0,0.0,320.0,568.0)
  • The layoutSubviews method of my base view(0.0,0.0,320.0,568.0)
  • viewDidLayoutSubviews (0.0,0.0,320.0,568.0)
  • the layoutSubviews method of my scroll view (20.0,426.0,280.0,200.0) correct resolved frame
  • The layoutSubviews method of my base view (20.0,426.0,280.0,200.0) again
  • viewDidLayoutSubviews (20.0,426.0,280.0,200.0)
  • viewDidAppear (20.0,426.0,280.0,200.0)

The important thing to note here is that you cannot just assume that because viewDidLayoutSubviews has been called that all your constraints have been resolved. The name is totally misleading, but its a private API and there is nothing we can do about it sadly.

Because the layoutSubviews method can also be called numerous times it is important to make sure you dont run complex process operations more often than necessary. In my case within layoutSubviews I have a simple check which verifies if my frame has changed since it was last processed. If it hasn't there is no need to re-process anything.

All things considered

After going to the effort to work out the above it hit me that my codebase was now significantly cleaner. I had seperated my concerns to a greater extent and it felt more MVC esque.

My manipulation of my views is now in a subclass of UIScrollView rather than in my UIViewController - my controller is now more targetted towards control and my view focussed on.. well.. the view.

I read somewhere in the Apple documentation that view manipulation from a UIViewController is perfectly acceptable. It is in the name really :) That said I find it incredibly intriguing that the way Apple has build its product and presented it to developers inherently results in what I believe to be better designed code bases.

I have learned a lot because Apple's codebase is private. Some more documentation would still be appreciated :)

Testing with Swift and XCode 6

Over the past month I have been developing an iOS app for one of the websites that I own and operate. I wanted to share a few considerations that I encountered when investigating testing and iOS.

Objective C has been around and developing as a language for an extremely long time. Given that, there are a lot of tools out there that have been written to aid developers in testing their apps. For example: the Specta testing framework, Expecta an assertion library, and OCMock for mocking.

With the release of Swift, things have changed somewhat. Even though Swift and Objective C integrate seemlessly, I would much prefer to write all of my code in Swift. In my opinion Swift is a much cleaner language - it is easier to read and write, and feels similar to the languages I regularly use to write web apps. PHP for example.

When it comes to legacy testing tools, Swift has some problems. For example, this page outlines exactly why OCMock won't work well with Swift. That said you can still achieve everything you could possibly want/need with Swift. You might just have to approach it in a different way.

XCode 6 also throws some problems of its own into the mix. To test your codebase it needs to be accessible by your testing target. I found that the easiest way to achieve this is to toggle "Allow testing host application APIs" under the "General" section of your testing targets settings. This alone however will not work because of Swifts new access control considerations.

For a class to be accessible within the testing target, it needs to be defined as being public. This makes me cry a little inside - I really hope Apple fix this very soon. None the less.. for now.. once this is done you can import your app into your tests by simply using import AppName

Even then I encountered some intriguing issues - I could not for example instantiate a class without getting errors (Use of unresolved identifier). It seems to be the case that your testing target won't import your app target correctly if there are any issues with your app which will prevent it from building. This is somewhat annoying but it is essentially another type of test :)

My Setup

As you may well know, I am a big fan of functional testing. That is why I wrote Phantom Mochachino.

The guys over at Square developed a great functional testing tool in KIF - I highly reccomend it.

When I am testing, i like to know that things work. KIF allows me to test that my app flows how I expect it to. Some people advocate mocking out HTTP requests when using KIF, but I see no reason too. I use KIF to test that my app works end to end. If I mock out my HTTP requests I simply verify that my app works if my HTTP requests work. That in my opinion is pointless.

KIF integrates seemlessly with Swift. As outlined by Brad Heintz. All you need to do to utilize KIF with Swift syntax is create a class containing the following code

class SwiftKIFTestCase: KIFTestCase {  
    func tester(_ file : String = __FILE__, _ line : Int = __LINE__) -> KIFUITestActor {
        return KIFUITestActor(inFile: file, atLine: line, delegate: self)
    }
}

You then simply extend this class when writing your test classes and call the KIF test methods on the return value of a call to tester().

One thing that I felt was lacking from KIF was the ability to reset your app. I implemented a code snippet to allow this in a similar way to which I implemented testing alternate paths in Phantom Mochachino.

To achieve this, I added the following code to my App Delegate:

    public func resApp() -> () {

        _initialStoryboard = window!.rootViewController!.storyboard;

        for view in self.window!.subviews {
            view.removeFromSuperview()
        }

        let initialScene:UIViewController = _initialStoryboard!.instantiateInitialViewController() as UINavigationController
        self.window!.rootViewController = initialScene;
    }

Calling this method will reset your app to the initial view controller.

I then extended KIFTestCase and added a class method to make calling this method from the testing target easy.

extension KIFTestCase {  
    class func reset() -> () {
        var delegate = UIApplication.sharedApplication().delegate as AppDelegate
        delegate.resApp()
    }
}

To reset your app within a test class, you just need to call KIFTestCase.reset(). Simple.

Unit Testing

In addition to using KIF I have written some very simple unit tests with XCTest. My app is relatively simple and as such XCTest provides all the power I need to unit test my apps.

There is an extremely insightful article over at objc.io that discusses using XCTest for unit testing.

The only other consideration that I have encountered in my iOS testing journey (thus far) is mocking. Mattt has written an interesting piece on testing which outlines why mocking is a non-problem with Swift.

Conclusion

I get the impression that testing iOS apps has not always been particularly easy. Then as iOS developed things got better. Now we have Swift one may have been concerned that things could take a step backwards.

In my opinion things have not. Yes, it may take a while for there to be as much 'documentation' on testing with Swift. Likewise it may take a while until we see reliable, well used, and well maintained Swift based testing libraries. Still, I came to testing with Swift with little to no experience and i found it to be relatively painless.

Finally.. if you were worried, I do unit test my HTTP Requests.