Double Negative

Software, code and things.

FastCGI in 5 minutes

FastCGI "is a binary protocol for interfacing interactive programs with a web server" (from Wikipedia).

In the same vain as my nginx in 15 minutes post, I thought i'd outline FastCGI, and its implementation on nginx with PHPFPM.

A FastCGI server is independent of your web server. You delegate your request to it, it processes it, and returns a response.

FastCGI is a protocol. An implementation of said protocol can be written in any language. PHPFPM is a process manager that implements the FastCGI protocol with a number of optimizations. It is now part of the PHP core and is well used on the web.

Whereas previous incarnations of CGI created a new request per process the FastCGI protocol processes multiple requests within the same process (multiplexing). This allows for concurrency and handling of higher loads.

The FastCGI protocol seeks to resolves many similar issues in the CGI protocol that nginx seeks to resolve in earlier versions of Apache.

nginx

nginx integrates with the FastCGI protocol throught its fastcgi module. That is to say it knows how to interface with a FastCGI server that implements the FastCGI protocol. This makes connecting to a FastCGI server extremely simple.

Communication occurs via interprocess communication (IPC). For a simple setup you can connect to a unix socket. For a more complex setup you might want to communicate with multiple servers using TCP sockets.

The most important fastcgi_param is SCRIPT_FILENAME which indicates as to where on the filesystem a file should be loaded for a particular request.

I choose to define my server root as follows:

set $root_path '/path/to/files';  
root $root_path;  

I then utilize the $root variable within my location blocks for static image files as well as my location block for passing php files to my FastCGI server.

Within the latter block I use:

fastcgi_param SCRIPT_FILENAME $root_path$fastcgi_script_name;

This maps a request for myfile.php to /path/to/files/myfile.php

You can utilize the fastcgi_split_path_info directive to allow for customized URL structures.

As long as you specify a regular expression with two capturing blocks you could direct a request to http://domain.com/request/my_file/hash to /path/to/files/myfile.php with relative ease.

You are only limited by your knowledge of regular expressions :)

Another interesting tidbit regarding the nginx integration is fastcgi_intercept_errors on.

This directive allows for a response from the FastCGI server to be directed to the appropriate location block based on your defined error_page directive.

Through this directive it is easy to display a custom 404 page for example should FastCGI return an error response.

PHPFPM

PHPFPM is a "process manager to manage the FastCGI SAPI (Server API) in PHP" (source).

In essence, what that means is that PHPFPM manages the creation of PHP processes (as required) to process the requests sent to it by the webserver.

It implements the FastCGI protocol such that for example, data sent to it as a fastcgi_param (when using nginx) can be processed and utilized appropriately.

When running PHPFPM on a local machine (server) it 'runs' the server to which one directs their PHP requests.

For usage with nginx you would connect to the server through a unix socket using the fastcgi_pass directive (as outlined above).

fastcgi_pass unix:/var/run/php5-fpm.sock;  

Summary

nginx, PHP, and FastCGI are a power combination in web application development because they integrate so seemlessly together with one another.

Configuration is simple, and they allow for deployment of dynamic websites that can scale with ease.

The curiosities of Facebook's developer offering

In the process of building multiplatform applications which integrate with social platforms I have been required to investigate Facebook's developer tools and processes.

Unfortunately the process has been somewhat painful. I thought I would document the curiosities I encountered such that anyone else encountering them can resolve them with more ease.

The review process

Whilst Facebook do provide documentation for their (relatively) new application review process, it is (at the time of writing) somewhat lacking and unclear. That said, I strongly advise anyone submitting an application for review to read all of the documentation thoroughly before attempting a submission.

For some reason Facebook have released an extremely well reasoned review process yet implemented it extremely poorly. There is/was a nice video outlining what exactly the review process is, and what it seeks to do. I cannot however find it (now), and am suspicious that it may have been removed because the gentlemens smile (in said video) did not match a realistic developer experience.

The main problem is that should you have any issues with your submission, you will more than likely receive a cryptic, generated response. The response that I received was along the lines of 'Your open graph action does not post on all platforms' which whilst strictly true could have said 'You are sharing a link rather than the open graph action under review'.

Whilst I did ask various Facebook staff members for comment, none was received. I can only assume that the tools provided to reviewers only allow for preselected responses. As such if you do have any issues it may well be a guessing game attempting to get it resolved.

Fortunately you can contact support.. right?

Nope. Facebook do not provide a support service to developers. At least not a generally available one. They do provide support to developers working under business umbrellas, but even then there are undisclosed requirements for being allowed help using their system. After playing the 'review process guessing game' for a number of rounds, I discovered this option and went through the motions of associating my business on Facebook solely for this purpose.

Depressingly, once I had access to this support option, I was helped by an extremely helpful individual and I was able to resolve the issue that had caused numerous review failures in 3 minutes. I.E. As soon as Facebook told me what the issue was, I fixed it.

I was also able to use this platform to resolve an issue whereby my application dashboard was 'out of sync'. That is to say, I could not submit a review because the platform thought my application was already under review when it was in fact not.

Again.. I questioned various Facebook staff as to why they dont just implement a proper review process and received no response. I have told them that it would save both them, and their developers a lot of time and stress.. but nothing.

Developer community

If you cannot get any support, your only bet is the Facebook developer community. This is a closed group for developers to ask questions and discuss development related issues. Whilst there are Facebook staff members in the group, again getting appropriate support is very difficult.

The developer community does however provide a little insight. With the greatest of due respect, a lot of the questions posted within the group are from inexperienced developers asking wide berth questions like "How do I do Facebook with PHP". I imagine that receiving many thousands of such requests through an open support channel would be hellish to manage.

That said if you are scratching at walls, unable to get anywhere.. you may find a helpful developer here who can point you in the right direction. I also feel like I cannot post something like this, and not offer my own support. If you have an issue that you cannot get to the bottom of, I am happy to try and help.

The API

Obviously building a solid API on such a massive scale is an extremely difficult task. The Facebook team have done a marvellous job with their various SDKs, developer tools, and debug tools. That said, I cannot write about all the positives, so I'll stick to writing about the few negatives.

I encountered two issues which bemused me. Both pertained to errors, and error handling.

In my web application I was utilizing a version of the PHP SDK that was maybe a month old. On submitting an open graph request for a custom story, I received an error response suggesting that I had not authorized the 'User Messages' capability. I had. After some futile debugging I decided to upgrade to the very latest SDK and the problem was gone. I am happy to excuse a small bug in a massive and complex product but I wouldn't call returning/processing a completely incorrect error response a 'small bug'. This occurred at a similar time to the synchronization issues with the review dashboard (outlined above), and as such the upgrade could have been a false positive. Either way it was certainly a serious issue.

The second issue was one pertaining to responses. Whilst attempting to share a link through the API some unchanged code suddenly started failing. I received the following error response:

[message] => An error occurred while processing this request. Please try again later. [type] => OAuthException [code] => 368 

Not only do you have to guess your way through the review process, but you have to guess your way through error handling too ;)

Fortunately Googling the error code suggested that the error code may well pertain to the link being flagged in some capacity. I was able to resolve the issue but it presented another curiosity to me.

The issue was that my link (a url shortener link) had previously pointed somewhere else (stupid I know). Facebook had cached the previous content (something slightly suspicious) and was flagging it. I'll give Facebook a break here on the basis that caching the Internet is pretty hard. I just thought I'd mention it in case someone else encounters a similar issue.

Conclusion

Whilst this post is primarily negative, it merely seeks to outline some curiosities with Facebook's developer offerings, and perhaps outline some resolutions for people encountering problems.

As mentioned what Facebook are doing.. and at the scale they are doing it.. is mightily impressive. That said, I just cannot fathom as to why Facebook have not 'polished' such important product offerings.

You do not see many (if any) issues with the main public production Facebook website. Why has the same attention to detail not been applied to the developer offerings? In many respects the open graph, and developer integrations allow an open medium for Facebook to expand its own offering by proxy of third party offerings. Surely that is incredibly important?

Further to that, whilst Facebook operates on a massive scale, they also hire on a massive scale. They (as I understand) have a massive team of incredibly talented developers.. I cannot understand how you can build infrastructure and tooling to handle billions of status messages yet cannot provide reviewers a text box to tell people why their reviews have failed..

The only other possibility is that the reviewers have a text box, but are on some sort of devillish commission structure and have to get through 1.6 million reviews every hour ;) Either way.. not cool.

I suspect (and hope) that Facebook will at some point get around to polishing their offering. If not I can not helpu but feel that they should at least put a BETA sticker on it.

nginx in 15 minutes

I am currently tying up various loose ends on a full stack project that we have invested a lot of our development time into over the past six months.

As a general knowledge exercise, and to make sure our server setup is optimized I have spent the past few hours fine-toothcombing the nginx docs.

In the process I learnt a number of new things, and discovered some interesting optimizations. I thought I'd post a brief 'nginx - what you need to know' kind of post. This is intended to be an overview of nginx for someone who has a generally solid knowledge of software/server architecture yet only has fifteen minutes to spare.

Architecture

nginx is events based which allows it to handle load better than for example Apache (which spawns a new process per connection).

It was built with the intention of handling high concurrency (lots of simultaneous connections) whilst performing quickly and efficiently.

It consists of a master process which:

  • reads and validates your configuration files
  • manages worker processes

The worker processes accept connections on a shared 'listen' socket and are capable of handling thousands of concurrent connections.

As a general rule you should configure one worker process per cpu core. Double that if you are serving mainly static content.

You can see these respective processes by executing ps -ax | grep nginx from the command line.

If you reload your nginx configuration, worker processes are gracefully shut down.

Configuration

nginx follows a 'c-style' configuration format.

nginx configuration allows for powerful regular expression matching and variable utilization.

server blocks define the configuration for a particular host, ip, port combination.

default_server can be specified on the listen directive to indicate to utilize that configuration block for any connection on that port (should no other block match). That is to say the below block will match a request on port 80 even if the host is not example.org.

server {  
    listen       80  default_server;
    listen       8080;
    server_name  example.org;
    ...
}

server blocks allow for wildcard matching or regular expression matching. For example you could match both the www and non-www versions of a domain name.

Exact match server names are however more efficient than wildcards or regular expressions on the basis of how nginx stores host data in hash tables.

location blocks define the configuration for a specific location.

Location blocks only consider the URL. They do not consider the query string.

Longer matches take preference - that is to say location /images will be matched over location / were you to request http://server.com/images/123.jpg.

Regular expression matches are prioritized over the longest prefix. If you want to match a regular expression in a location block, prepend it with ~ e.g location ~ \.(gif|jpg|png)$

Regular expressions follow the PCRE format.

Any regular expression containing brace ({}) characters should be quoted as it will it would be otherwise unparseable (given nginx's usage of braces for block closures).

Regular expressions can use named captures. For example:

server {  
    server_name   ~^(www\.)?(?<domain>.+)$;

    location / {
        root   /sites/$domain;
    }
}

Interesting features

Load balancing with nginx is really easy. There are three loading balancing methodologies available in nginx:

  • round-robin
  • least-connected
  • ip-hash

Load balancing is highly configurable and allows for the intelligent direction of requests to different servers.

Health checks are in built such that if a particular server fails to respond nginx refrains for sending the request to that server based on configurable parameters.

HTTPS is also easy to implement with nginx. It is a case of adding listen 443 ssl to your server block and adding directives for the locations of your certificate and private key.

Given that the SSL handshake is the most expensive part of a secure offering, you can cache your ssl sessions.

ssl_session_cache   shared:SSL:10m;  
ssl_session_timeout 10m;  

Modules

nginx is very modular in its nature. Although you interact with it as one unit through your configuration it is in fact made up of individual units doing different things.

nginx offer a detailled module reference - this is an overview of some interesting or less commonly discussed modules available to you.

The internal directive of the core module allows for a location to only be accesible to an internal redirect.

For example, if you want 404.html only to be accessible to a 404 response you can redirect requests from the error_page whilst not making it accessible to a user typing it directly into their browser.

error_page 404 /404.html;

location /404.html {  
    internal;
}

Autoindex

If you dont want to show directory listings to nosy users you can turn autoindex to off. This can be used to protect your image directory for example.

Browser detection

You can use modern_browser and ancient_browser to show different pages dependent on the browser which your client is using. See here.

IP Conditionals

You can use the geo module to set variables based on IP ranges. You could for example set a variable based on a locale IP range and use that to send users from a specific country to a specific location.

Image manipulation

nginx even offers an image filter module. This module allows you to crop, resize, and rotate images with nginx.

Connection limits

You can limit a particular IP to a particular number of concurrent connections. For example you could only allow one connection to files within your 'downloads' folder at a given time.

In addition to that, you can limit the request rate and configure the handling of request bursts greater than a configurable value.

My initial thoughts were that this would be a fantastic method of preventing people from hammering a public API for example.

More information can be found here and here.

Request controls

Further to the above, nginx allows you to limit the request types that a particular block will handle.

This would again be extremely useful for an API offering.

limit_except GET {  
    allow 192.168.1.0/32;
    deny  all;
}

Conditional logging

The log module allows for conditional logging.

I thought I'd give this a mention because I can see a lot of merit in only wanting to log access requests that result in bad response codes.

The example listed shows this:

map $status $loggable {  
    ~^[23]  0;
    default 1;
}

access_log /path/to/access.log combined if=$loggable;  

Secure links

This module is pretty awesome - it allows you to secure your links and associate validity time periods with them utilizing the nginx server software.

Personally, I have no use for it because although I have production use cases of similar functionality, I can't help but feel that you could do this is many easier ways.

Novelty

There was one module that I found somewhat novel. I just can not see a use case for it - perhaps someone can enlighten me?

nginx offers a module to show a random file from a given directory as an index file.

I need this :P

Summary

As mentioned, the above is based on a thorough read through of the nginx documentation.

The following chapter from 'The Architecture Of Open Source Applications' was also incredibly interesting. It offers a significantly more complex look at the internals of nginx. Perhaps not suitable if you really did only have 15 minutes ;)

I highly reccomend reading the documentation yourself if you are interested in, or are running nginx on a server.

If you have any questions I would be happy to answer them.

JAXL - Connecting to Google's Cloud Connection Service (CCS)

I recently implemented the server side of a system to send notifications to iOS apps through APNS. This was extremely easy to implement and opened my eyes to the benefits of having a persistent streaming connection to Apple's servers. That is to say my backend is a constantly running service, and as/when new 'notifications' are stored in our database they are immediately sent through to APNS.

For Android notifications I was utilizing GCM and a simple HTTP connection to Google's servers (using CURL). I ran the script as a cron every x minutes and it would send out the notifications as appropriate.

Whilst this worked perfectly fine.. the idea had now crossed my mind and it was inevitable that I'd have to implement a similar persistent connection to Google's servers for GCM.

Cloud Connection Service (CCS) and XMPP

Google have a service called the Cloud Connection Service to achieve this. It utilizes the XMPP protocol (originally an instant messaging xml based data transmission format) to communicate in both directions with a client.

As you may have noticed (if you are a regular reader of our blog) a lot of our products are PHP based. For this particular project it made sense to execute this functionality using PHP.

Google do not provide any examples of utilizing PHP to connect to CCS and in fact there are very few well maintained, well used, generally solid implementations of XMPP communication in PHP.

A brief 'Google' of XMPP presented a lot of information about the protocol. I would assume that Google opted to utilize XMPP because it is a 'standard', and has been used and developed over 15 years. It is seemingly well known within the engineering community and it allows for two way communication such that GCM can supply 'Receipts' for messages. This is something that iOS and APNS can not provide.

On a personal level I have absolutely no experience with XMPP. In its entirity it is quite a complicated protocol. For a GCM implementation the only things that are relevant are essentially authenticating, and sending messages. Even though CCS allows for bi-directional communication, this was of little interest to me.

Getting to work

I wrote a script to connect to Google's CCS servers and send messages using JAXL. This was really easy to do:

//Initialize
$this->client = new JAXL(array(
         'jid'=> $this->senderId .'@gcm.googleapis.com',
         'pass'=> $this->googleApiKey,
         'auth_type'=> 'PLAIN',
         'host' => $this->host,
         'port' => $this->port, 
         'strict' => false,
         'force_tls' => true,
         'log_level' => JAXL_DEBUG,
         'protocol' => 'tls'
    ));

    //add a callback for authorisation success
        $this->client->add_cb('on_auth_success', function() { 

      $this->client->set_status("available!", "dnd", 10);

        //send your messages
      $this->sendYourMessages();
    });

    //start the client
    $this->client->start();

Within my sendYourMessages method I was loading some notifications from my database, looping through them, and sending them utilizing $this->client->send() / $this->client->send_raw().

This worked perfectly.

Given the number of Stack Overflow questions and Google Code discussions about the difficulty of connecting to CCS with PHP I was somewhat bemused to say the least.

Unfortunately however I had been a little too optimistic. What I wanted to do was maintain a persitent connection to the CCS server and constantly poll for new notifications in my database.

Given that PHP and asynchronicity are rarely found in the same sentence together, this was going to be a little tougher to achieve.

My intention was to utilize a continuous while loop to continually poll my database:

while (true) {  
    //poll database
    //send messages over XMPP connection

    //take a nap
    sleep(5);
}

The problem is that this is blocking - nothing below this block will ever execute until the while loop completes (which is never).

If you look into the internals of JAXL it works in a slightly more complex yet similar way. That is to say there is a continous blocking loop checking the connected streams to see if it can/should read/write to them, and then acting accordingly.

When you execute start() on the client it configures things and then executes JAXLLoop::run() which starts this continuous blocking loop.

The long and the short of it is that you cannot continuously poll the XMPP connection and continuously poll your own data source.

Read the source

I made the foolish mistake of going in blind and trying to hack together an appropriate resolution.
A fear of the complexities of XMPP and a smidgen of laziness ironically meant that a resolution took significantly longer than it should have.

After a number of hours of futility I decided to step back and read through the JAXL source in its entirity. At this point everything slotted into place and a suitable resolution (see below) was relatively easy to come by.

The JAXL source is a little 'hmm ok'.. but it is pretty simple to get your head around.

As for XMPP.. whilst a lot of the information on the web is very much all or nothing, I did find this: How XMPP Works Step By Step which I found to be the most concise explanation of the relevant workings.

If you utilize the JAXL_DEBUG log_level in your configuration, the output matches up almost perfectly to that outlined in the above link.

The resolution

I wanted a resolution that could work on top of JAXL without requiring a significant time investment or refactoring.

Conceptually the resolution was as follows: Implement batch data polling within the JAXL Loop.

We can remove the issue of one loop blocking the second by.. only having one loop :)

During my research into the problem I stumbled upon this StackOverflow answer which suggests using UDP sockets. This seems like a complete 'overengineering' in the sense that it would work but the complexities and problems associated with it beg the question 'Why not do something easier?' (like the below).

I have forked JAXL and committed my changes to github here.

What i have essentially done is manipulate the 'periodic jobs' concept already contained within the JAXL codebase. If you look here there is a very brief explanation.

In that same file is the following message:

"Since cron jobs are called inside main select loop, do not execute long running cron jobs using JAXLClock else the main select loop will not be able to detect any new activity on watched file descriptors. In short, these cron job callbacks are blocking."

This is important. Essentially the clock (loop) is executed every second and we say 'has 15 seconds passed since we last got data'. If it has, we execute the callback which passes through the message to the JAXL event handler.

Within your implementation of the handler callback you load your data. Whilst you are doing this, the clock is not ticking. That is to say the streams are not being monitored. Make sure your data loads quickly !

This is by no means ideal but it is about as good as it gets with PHP.

Usage

The usage of this setup is as follows:

  • Pass the configuration parameter batched_data when you instantiate JAXL

  • Add the get_next_batch callback to your client.

  • Within the get_next_batch callback load your data and 'send' it using send / send_raw.

Alternatives

During my research I found a number of discussions questioning how to approach doing similar things with JAXL.

For example this one, and this one.

Abhinav Singh (the creator of JAXL) historically was quite active in responding to threads about JAXL.

In a number of threads I have found he mentions jaxlctl, and utilizing pipes. I investigated both of these options and found them to be not worth pursuing. That said.. have a play, and if you produce anything interesting I would be intrigued to see it.

As for JAXL in general. A post on Google Code and the lack of commits in the past few years suggests that Abhinav has stopped working on JAXL :(

Conclusion

It is a pretty simple conclusion really.. JAXL can be used with Google CCS, and a solution for a persistent connection whilst polling for new data is possible. It has a few drawbacks, but this code is being utilized without issue in a production environment.

Hopefully some of the people trying to implement such functionality will stumble upon this post.

If you have any questions/comments, I would be happy to answer them :)