CloudFront Edge to Origin Auth

CloudFront, the CDN from Amazon Web Services, has long supported authenticating between the CDN’s edge and S3 using Origin Access Identity, allowing you to lock down your origin and ensure users can only access your content through CloudFront.

A more difficult problem is restricting access on a custom origin – ensuring that the only people who can talk to your back-end webservers are actually coming from CloudFront. This has traditionally “worked around” by adding the CloudFront IP ranges to a security group or FW in-front of your application. The issue with this approach is two fold:

  • Anyone can create a CloudFront distribution pointing at your origin to bypass this
  • You have to handle synchronisation of the IP JSON against your security groups / FW

As of December 2015, CloudFront now supports setting custom headers from edge to origin. This allows us to use a common pattern for handling authentication of the CDN edges – Pre Shared Keys, inserted into a header, validated by the origin webserver. These same steps can apply to many CDNs, but in this post we’ll cover the configuration of CloudFront and the origins. This also nicely coincides with Amazon’s release of Amazon Certificate Manager, which allows you free SSL certs for use with Amazon CloudFront or the Elastic Load Balancer.


There are two parts to this – firstly we’ll configure CloudFront, and verify that the header is being set as expected. Afterwards, we’ll configure the origin to validate that Header and block unauthorised users.

CloudFront Configuration

Within a given CloudFront distribution, we have one or more origins. “Origin Custom Headers” are configured on a per-origin basis, and are of Header:Value pairs. In our case, we only need to add “X-PSK-Auth” and a value. All we need to do to have CloudFront send this to our origin is to edit your origin settings, and add this:


Once your CloudFront distribution has moved from InProgress to deployed, we can test this.

Testing the Configuration

In order to make sure we’re getting the correct header set, we can look on the origin at the headers that are being set. For this, I’ve used a simple PHP script which prints all of our headers:

and now I can see that my header is being set properly:


Controlling Access at the Origin

Actually controlling the access depends greatly on your origin webserver. I’ve provided a few sample references on this cdn-auth GitHub repo (pull requests welcome). Most of these have support for both an ‘old’ and ‘new’ key to allow rotation. This is pretty important when working with a third party CDN – as once you’ve hit ‘save’ on a configuration, you don’t have a guarantee of when that’ll be applied.


If we’re using Amazon Web Services for our origin – we can use the AWS WAF attached to an Application Load Balancer to support the filtering of traffic before it ever hits our own instances.

Put simply, we create a WebACL with a String Match Condition filter on the X-PSK-Auth header. We match for the string we choose as our PSK – and we then attached that to our ALB as a WAF policy.

Without the auth header, the traffic is filtered before ever hitting our origin — at the AWS ELB.

$ curl -vo/dev/null http://alb/

> GET / HTTP/1.1
> Host: alb
> User-Agent: curl/7.51.0

HTTP/1.1 403 Forbidden
Server: awselb/2.0


With the auth header, the traffic reaches our origin and our request is fulfilled by nginx

$ curl -vo/dev/null -H "X-PSK-Auth: e6e59c8c1dcca46fde36bf43b84487d8" http://alb/

> GET / HTTP/1.1
> Host: alb
> User-Agent: curl/7.51.0
> X-PSK-Auth: e6e59c8c1dcca46fde36bf43b84487d8

HTTP/1.1 200 OK
Date: Wed, 12 Apr 2017 07:49:39 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Server: nginx/1.10.2


In this case we’re going to configure an Apache origin:

We simply insert this in our VirtualHost – but as per the linked GitHub repo, this could also sit within a specific Directory stanza.

Now we have this, we can validate that going direct to the origin fails:


Whereas through the CDN is allowed through.

Considerations and Drawbacks

There’s one notable drawback to this approach, which is that the key is fixed between the edge and the origin. This means if you have untrusted users who can inspect or dump the headers between the CDN and your origin (for instance, if they can upload PHP) – then they’ll also be able to see the PSK between your edge and origin. Although if you have users who can upload arbitrary PHP – you probably have other issues.

This is in some way mitigated by stripping the X-PSK-Auth header at the terminating HTTP(S) server before it’s passed to an application server, and by using End-to-End SSL (Client to CDN, CDN to origin), it reduces the risk of sniffing.

Photo Old London Underground ticket barrier by Matt Brown (CC)

Recovering a Debian System after running rm /*

Flying - felixtsao

The Oxford English Dictionary[foot] I cannot believe that this is actually defined in the OED. I assumed it was a fake site or something. I also cannot believe that I actually used “The OED defines […]”. I didn’t even use that the best man speech at my brother’s wedding.[/foot] defines an ohnosecond as:

a moment in which one realizes that one has made an error, typically by pressing the wrong button.

It’s more commonly referred to in Operations Management parlance as:


It is unfortunately something that will happen to everyone during their systems administration career, and the variations are almost endless, some notable occurrences include:-

  • Copying SSL libraries over from a Debian host to a RHEL host
  • Setting a new root password and immediately losing it
  • Copying over an out of date backup CMS to a production system
  • Running one of the many variations of ‘rm’ at the wrong level

Unfortunately in a recent scenario, a poor hypothetical sysadmin managed to issue:

rm /*

instead of:

 rm ./*

This removed every non directory at the / level. The impact of this varies between operating systems, and even between Linux distributions. We’re lucky that in this scenario, there was no ‘-rf’ specified – or it’d be ‘recover from backup’ time, however this situation did (hypothetically) pose an interesting conundrum.

The Problem

In Debian x86_64 systems, /lib64 is a symlink to /lib, and you’ll find most applications (for instance, ‘ldd’) are linked to libraries in /lib64: => (0x00007fff251ff000) => /lib/ (0x00007f278eb38000)
/lib64/ (0x00007f278eea2000)

In the event of /lib64 not being accessible, most applications will fail to run because they’ll be missing a myriad of dependencies and won’t be able to find them. After a bit of brief investigation and some furious attempts to revive it with frankly disappointing results, including:

  • Using a statically compiled symlinker such as sln (Available by default on CentOS and RHEL, not on the affected Debian box)
  • Copying over sln via netcat and writing it out (proved surprisingly difficult)
  • Trying to copy over a symlink via rsync (couldn’t rsync/scp/sftp as they need to exec another process – which they can’t because of missing libraries)
  • Using BusyBox (Needs dynamic linking)
  • Writing a linker in C, compiling it, getting it over there via a mixture of cat, echo \x{..}\x{..}, and other incantations (I lost the will to live around this point)

The Epiphany

I eventually remembered a slideshow – chmod -x chmod – which was surprisingly relevant. You see, those more eagle eyed may have noticed we would end up missing one important dependency:

ld-linux and ld-linux-x86-64 find and load shared libraries used by other applications – preparing programs to run, and then actually executing them too. Most Linux binaries require dynamic linking, meaning at runtime the libraries that the application depends on are loaded in from a shared source rather than compiled into the executable, unless the -static option was used during compilation. As this is quite unlikely (with most modern distributions), this means if you cannot access, you’re in trouble. Luckily, you can still use to execute arbitrary commands, and it’ll resolve the dependencies relative to your LD_LIBRARY_PATH at that point. A simple:

/lib/ /bin/ln -s /lib/ /lib64/

Restored the symlink and allowed normal execution of binaries again, leaving our hypothetical sysadmin off the hook, except for having to write a mildly humiliating email to the rest of the operations team who, understandably, responded a bit like this.

Photo Flying by felixtsao (CC)

Tumblr to WordPress Import – Maintaining Links

Starting Life - jimdeane

I recently migrated away from Tumblr, as I found that Tumblr was heading more towards micro-blogging – reducing the size of the posting editor (seriously? LOOK AT THE PROPORTIONS) which made embedding code snippets or writing more lengthy posts pretty arduous. As an unapologetic geek – WordPress seemed like the natural choice.

The Tumblr to WordPress Import Process did a reasonably good job of importing everything – but I wanted to make sure to not lose the (already indexed/linked) URLs. Unfortunately this wasn’t quite as easy – as defining custom permalinks on a per-post basis in WordPress appears to still be manual (via an .htaccess). To resolve this generically (without having to make a new alias for the mammoth amount of posts I had (ahem), I simply set the permalink format to the name of the post (which follows the same format as tumblr) and defined the following RedirectMatch regular expression in my htaccess:

RedirectMatch permanent ^\/post\/(\d+)\/(.*)$ /$2

From a URL such as:

This will isolate:


and (permanently) redirect it to:

and, as such, play nicely with search engines.

Photo Starting Life by jimdeane (CC)

CORS Headers in Nginx

Update: Before going much further, there now is a much more comprehensive CORS walkthrough for nginx at – so check that out before following the below.

If you’ve deployed even a mildly complex web application in the last few years, you’ve probably had to care about CORS headers. They allow webpages to make requests to another domain, or the same domain on another scheme. Without them, you’ll find that trying to request other assets will be forbidden by your browser, and things won’t load.

They’re relatively simple to implement. You just add a header:


to the HTTP responses of assets you’d like to call in your webapp. Thanks to Michiel Kalkman’s gist you can easily achieve this in Nginx – with something relatively standards compliant, too.

The problem, it seems, is that despite the W3C spec and RFC 6454 prescribing the use of a list of origins, not all browsers (e.g. Firefox) support multiple domains in an Access-Control-Allow-Origin header:


The easiest solution is to use a wildcard:

Access-Control-Allow-Origin: *

However that can cause some security implications. The best compromise I’ve found to get around this was to implement a simple whitelist in the Nginx config and match against that. I’ve put this in a public gist – and I’m testing it for deployment now.

I’ve not yet done any performance testing, so I’m not sure how efficient the Nginx regex engine is and what the overall effect on throughput/capacity is. I’ll probably forget to update this post with a bit of information once that’s complete.


This has been in production for a couple of months now, and we haven’t had any performance issues. It seems that for the throughput we require (<10 req/s) we’re able to yield the load on a single m1.small comfortably, so I think the nginx regex engine’s pretty efficient.

Celery and a failing MySQL Server

Celery is a distributed task queue for Python. It’s pretty useful, and a lot of apps I’m involved in deploying seem to be using it lately.

Something it seems to struggle with is stability; in the event of a database disappearing, being unable to resolve a database’s hostname, or a single connection to a database failing, it just shuts down.

I needed this to not happen, when running things in “the cloud” (sorry) you’re very much at the mercy of other people controlling your networking/tin/everything – so you need to write applications that are capable of a little bit of failure (even if the application was originally written in this way to avoid split brain or similar). To get around this, we implemented monit. I am definitely not a fan of apps automatically restarting, but it was the only trivial resolution in this situation. Just append this to your monit config and you should be sorted. My understanding is that there isn’t a better solution yet, but would be interested to know if anyone has seen one.

check process celeryd with pidfile /var/run/
start program = "/etc/init.d/celeryd start" with timeout 10 seconds
stop program = "/etc/init.d/celeryd stop"
if changed pid then restart
if 5 restarts within 5 cycles then timeout
alert youremailaddresshere

(I appreciate this is especially tedious, but this is for my reference)

Making nginx ignore query string parameters

When using nginx as a caching proxy, I found myself needing to ignore particular parameters for both the cache key and the values being passed to the backend. In this particular situation the value I wanted to ignore was ‘uid’. An example URI being:




To ignore this, in the top of my site configuration I put:

proxy_cache_key         "$scheme$host$uri$is_args$args";

in the server stanza:

if ($args ~ (.*?)(?:^|(&))uid=[^&]*(?:(\2.*)|&(.*))?) {
    set $args $1$3$4;
if ($args ~ (^w)) {
    set $args ?$args;

and the location stanza:

proxy_pass              http://appservers$uri$args;

So now my backend servers see:

GET /foo.ext?env=bar&node=qux


GET /bar.ext

and seldom few hits get through to there anyway, as the cache key flattens it appropriately.


EDIT: The ‘easy’ bit is a lie, it seems. Thanks to @davidgl for pulling me out of regex hell. Several revisions here helped by him.

fail2ban time offset issues

While trying to set up fail2ban, I found that even though my regex/logs matched up nothing was being banned/caught by fail2ban

After a bit of investigation it seems that the auth.log time was being written in GMT whereas fail2ban was expecting it in BST:

==> /var/log/auth.log <==
Oct 11 20:52:21 ns2 sshd[18119]: Invalid user test from
Oct 11 20:52:21 ns2 sshd[18119]: Failed none for invalid user test from port 47862 ssh2
Oct 11 20:52:28 ns2 sshd[18119]: Failed password for invalid user test from port 47862 ssh2
==> /var/log/fail2ban.log <==
2010-10-11 21:52:04,017 fail2ban.filter: DEBUG  /var/log/auth.log has been modified
2010-10-11 21:52:04,029 fail2ban.filter.datedetector: DEBUG  Sorting the template list

Fairly simple fix of:

rm /etc/localtime
ln -s /usr/share/zoneinfo/Europe/London /etc/localtime

and I am now successfully banning myself from accessing my server.

MessageLabs Mail Filtering and Vague Errors

450 Requested action aborted [7.2] 20412, please visit for more details about this error message.

It took a remarkably large amount of searching to find out what ‘[7.2]’ meant in this error message, and why we kept getting a mailserver’s IP blacklisted, but if this happens to you, hopefully this will help resolve it.

When MessageLabs returns a [7.2], this seems to mean that they’ve checked the IP address of the host which is connecting to their MX against the CBL. Connections will be dropped immediately, rather than mail being rejected, as such:

# telnet 25
Connected to (
Escape character is ‘^]’.
450 Requested action aborted [7.2] 20412, please visit for more details about this error message.
Connection closed by foreign host.

The easiest way to get around this is to fix your mail server, then request delisting from the CBL.

In a completely unrelated note (ahem), it seems that you may be added to the CBL if you send an email from a domain where the sending mail server is explicitly disallowed by SPF records (such as -all with no matching include), to a gmail address; Google will automatically (?) submit the IP address to the CBL and your problems will begin (again).

I highly recommend robtex as a lazy way to check your hosts against blacklists.

VMWare ESX and a full SQL Server Database

Hypothetical situation. You installed VMWare ESX, possibly upgraded from 3.5 to 4, went with the embedded SQL Server, and Many Years Later the VirtualCenter server no longer starts. You look through the event logs and the best you can find is:

Faulting application vpxd.exe, version 4.0.10021.0, faulting module kernel32.dll, version 5.2.3790.4480, fault address 0x0000bef7.

So you decide to look at general application eventlog events rather than just for VMware:

Could not allocate space for object ‘dbo.VPX_EVENT’.’PK_VPX_EVENT’ in database ‘VIM_VCDB’ because the ‘PRIMARY’ filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

“Great”, you think. I can just pass this over to a DBA to get them to increase the filegroup size. Then you dig a bit deeper and look at the event log for SQLServer:

CREATE DATABASE or ALTER DATABASE failed because the resulting cumulative database size would exceed your licensed limit of 4096 MB per database.

“Oh no!” you sob. You really don’t want to try migrating to an enterprise database right now. Worry not, there’s a VMWare solution. This easy process is:

  • Install Microsoft SQL Server Management Studio Express
  • Download and extract
  • Make sure all VMWare VirtualCenter processes are stopped
  • Open Microsoft SQL Server Management Studio Express
  • File -> Open -> Choose the extracted sql script
  • Change the database from ‘master’ to ‘VIM_VCDB’ in the dropdown on the top bar
  • Press ‘Execute’
  • Evaluate the deleted rows, make sure it’s not more than you’d expect (ok, I didn’t do this)
  • Change
  • Press ‘Execute’ again.
  • Wait. Get a coffee. Get eight. It will eventually finish:
****************** SUMMARY *******************
Deleted 8400 rows from VPX_TASK table.
Deleted 2585209 rows from VPX_EVENT_ARG table.
Deleted 1662120 rows from VPX_EVENT table.
Deleted 0 rows from VPX_HIST_STAT1 table.
Deleted 0 rows from VPX_SAMPLE_TIME1 table.
Deleted 0 rows from VPX_HIST_STAT2 table.
Deleted 0 rows from VPX_SAMPLE_TIME2 table.
Deleted 0 rows from VPX_HIST_STAT3 table.
Deleted 0 rows from VPX_SAMPLE_TIME3 table.
Deleted 105331 rows from VPX_HIST_STAT4 table.
Deleted 373 rows from VPX_SAMPLE_TIME4 table.
  • Start VCenter Server. Wait. Try and connect. Hope. Pray.
  • Connect to VCenter Server
  • From the client, press Ctrl-Shift-I
  • Go to ‘Database Retention Policy’, and enable it.

Hopefully this will save someone a bit of googling.