Too few people understand the benefit of using a caching reverse proxy server to improve web page delivery speeds, and instead go straight to a CDN solution, which can be costly and complex to administer.
Conversations about web page speeds can often go like this:
My website is slow – how can I make it faster?
Just use a CDN!
Although using a CDN can help with page load speeds, a CDN is not automatically the right solution.
Content Delivery Networks
A Content Delivery Network (CDN) can be an important tool in achieving decent page load and web application speeds. Providers such as Akamai, EdgeCast, and more recently Amazon (with CloudFront), Rackspace (CloudFiles), Google (PageSpeed) and Microsoft (Azure CDN) are providing the means to distribute your content to locations geographically closer to your customers/users, which improves the responsiveness of the application or website.
End of story, no?
Not quite. Using a CDN can be a bit like using a sledgehammer to crack a nut: overkill. This is especially true if your customers or users are not geographically spread (i.e. most or all users are in a single country or region).
HTTP Caching Reverse Proxy
If most of your users are in the same region, you should consider using an HTTP caching reverse proxy, such as Squid, Varnish, nginx (with HttpProxy module) or Apache with mod_proxy. My personal favourite is Squid, which I have been using for over 10 years, both as a forward proxy and a reverse proxy, but many people rave about Varnish: which proxy cache you choose depends on the exact caching requirements.
In a typical/simple configuration, the caching reverse proxy sits within your own infrastructure, in front of your web application server. That is, the first server which sees inbound requests is the caching proxy. The proxy talks to other servers behind it when it needs to fulfil a request for content not in its cache, and serves the content back to the user. The types of content cached can be finely controlled, so that (for example) only images, Flash movies and CSS files are cached, and other content is always requested afresh from the application servers behind the proxy.
Why Use a Reverse Proxy?
The result of using a well-configured caching reverse proxy is usually a huge speed-up in page load times for end-users. This is due to several factors:
- The cache can typically serve assets more rapidly than an application server, as the workload on the application or web server is more mixed than on the cache server.
- Off-loading the serving of common, seldom-changing assets to the cache server frees up the web/application server to handle more “useful” requests, which in turn makes for speedier page response times. The application/web server can render the page more quickly, as it is not spending time serving content.
- The web/application servers receive fewer inbound connections, so they are more responsive in general, spending less time thread context switching.
Properties of a CDN
A CDN typically has the following properties:
- A set of “edge” servers which are located in various distinct geographic locations
- Suitable for slowly-changing content, because content propagation times are relatively high (hours)
- Owned by a third party
- Usually combined with custom DNS solutions (with low DNS TTL values) to effect the geo-direction
- Disconnected (by design) from the web application
- Typically serve “static” content such as images, Flash, video, etc.
- Cannot effectively cache dynamically-generated content
- URLs or applications often need to be modified to work with the CDN
Properties of Caching Reverse Proxy
- Local (close) to the web application, usually in the same DC
- Reduces the load on web/application servers for cacheable content
- Can cache many kinds of content, including dynamically-generated content
- Full control of cache flushes is with you
- The web application is ‘unaware’ of the caching taking place and does not need to be modified for the benefits of reverse proxy caching to be had
Which Is Right For Me: CDN or Caching Reverse Proxy?
- A CDN locates static content geographically close to end-users to avoid transmission delay
- A caching reverse proxy reduces load on web/application servers and avoids unnecessary trips to a database or other content store for frequently-accessed content
So, if your users are geographically spread, use a CDN. If you need to reduce load on web or application servers for common content, use a caching reverse proxy. If you need to address both issues, use both a CDN *and* a caching reverse proxy.