Web sites and web applications are increasingly using secure connections (HTTPS) for all traffic not just obviously sensitive data, as a way to guard against security threats. However, HTTPS requires encryption/decryption of data, which is computationally intensive. Web applications can therefore benefit from “offloading” the encryption/decryption processing required for HTTPS to specialised hardware devices.
Secure Connections with SSL/TLS
The HTTPS scheme uses SSL or TLS to “wrap” a secure, encrypted channel around the HTTP connection between browser and server. The abbreviations SSL (Secure Sockets Layer) and TLS (Transport Layer Security) are often used interchangeably; TLS is in effect the more recent and secure version of SSL. In the words of RFC 5246 (the specification of TLS 1.2),
The protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery
The key thing here is that HTTPS is a point-to-point communication; the security exists only between the two directly-communicating endpoints. This is unlike other secure messaging schemes such as WS-Security, which can provide end-to-end security (with non-trusted intermediaries).
Encryption using Public Key Algorithms
HTTPS makes use of the SSL/TLS standards for establishing a secure session, which ensures that the data sent between two computers is not readable or modifiable by intervening parties. SSL/TLS in turn rely on Public Key algorithms (not to be confused with PKI), in which each end of the communication channel provides the other with a public key to use when encrypting information to be sent to the other end:
a public key algorithm does not require a secure initial exchange of one, or more, secret keys between the sender and receiver. [Wikipedia]
Only the holder of the private key can decrypt the message encrypted with the public key, so the two-way communication is thus secure.
The Compute Cost of SSL Crypto
Although there is some debate about exactly how expensive HTTPS crypto is (F5 and Adam Langley of Google showing two opposing views), crypto is clearly not a cost-free operation, and takes a significant number of CPU cycles.
Increasingly exploits (such as ‘BEAST’) for older and weaker versions of SSL/TLS and faster commodity hardware means that a DES 56-bit key can be discovered within a week, mandating increasingly longer encryption keys (NIST recommends 2048-bit keys).
Whilst there are efforts such as Googles Overclocking SSL aimed at optimising SSL performance, the use of increasing key length to counter threats commits a greater number of CPU cycles to the HTTPS round trip.
SSL Offload Basics
If the HTTPS connection is terminated at the web application server, because HTTPS affords point-to-point security only, that server must decrypt the HTTPS payload before responding to the request, as shown here:
This means that while the payload is being decrypted, the web application server execution thread is unavailable for serving other web requests, potentially leading to thread pool exhaustion or at least longer response times.
The solution is to “offload” this processing effort to a dedicated piece of hardware (or software), either on the web server or (more effectively) in front of the web tier, as shown here:
The result is higher throughput from the web application servers, particularly if those servers are running on ARM/RISC hardware (such as the new HP ProLiant servers), and therefore less optimised than some for crypto processing.
Because the HTTPS tunnel has been terminated at the load balancer layer, the web application servers receive “plain” HTTP on port 80. In order to convey to the web servers that the original connection had been encrypted (say, for a login page), an additional HTTP header can be inserted by the load-balancer device, such as:
The web application can then distinguish secure and non-secure connections, even though it never sees HTTPS traffic directly.
SSL Offload on the Server
There are broadly five ways to offload SSL on the server:
- Software such as stunnel (http://www.stunnel.org/), Stingray Traffic Manager, and Kemp Virtual LoadMaster
- SSL Accelerator add-in cards for servers (using hardware encryption/decryption). There are interesting Open Source projects for SSL chips at opencores.org, and hybrid SSL devices from Freescale. PC Engines’ ALIX boards have on-board AES-128 crypto too.
- Dedicated hardware devices (more details below)
- Build your own SSL accelerator using Nginx – the original post on o3magazine is no longer around, but the discussion on Slashdot has some interesting background on SSL acceleration, and archive.org has most of the details.
- Use CPUs with Intel’s AES-NI instruction set (see below) – works only with a limited set of key sizes.
In practice, most enterprise SSL offload is handled by dedicated hardware devices known as web application accelerators (aka content switches or application delivery controllers, ADCs), such as the Cisco Netscaler, F5 BIG-IP, Kemp LoadMaster, and Blue Coat ProxySG – #3 in the list above.
These devices are examples of multi-layer switches; network devices which operate at several different layers of the OSI model, including layer 6, where SSL/TLS operate. These web ADCs can provide significant functionality and intelligence, for instance by optimizing TCP requests from a client browser, compressing/uncompressing HTTP data, and by offloading the SSL/TLS decryption workload.
Such devices become particularly crucial when HTTPS traffic needs to be terminated then re-encrypted for onward transmission to another server inside the (possibly untrusted) network; financial and military networks typically employ this strategy:
Hardware and Virtualisation for SSL Offloading
The performance of software implementations (#1 above) for high traffic sites is doubtful. Certainly, the best software implementation of SSL encryption/decryption will almost always be substantially slower than the best hardware implementation. Therefore, either add-in cards (difficult to scale/manage), or dedicated hardware are likely to be the choice for large web sites.
The latest hardware SSL accelerators (such as the AX series from A10) offer an additional benefit: virtualisation. By running a special hypervisor atop the crypto-tuned hardware, multiple virtual appliances can be defined, with the benefit of isolating changes from each other, whilst retaining the benefit of direct access to specialist encryption/decryption hardware via the hypervisor.
SSL Offload on the Client
For mobile devices, reducing compute cycles leads to power savings. Therefore, if computationally expensive operations such as encryption/decryption can be performed in hardware on the device, the result will be extended battery life. Some devices now include dedicated crypto hardware support, such as Intel’s AES-NI (Advanced Encryption Standard New Instructions).
This “client-side” support for hardware crypto is still in its infancy, but seems likely to grow as the power-saving and speed advantages become apparent (and HTTPS yet more widely used).
Beyond SSL Offload
Although offloading SSL to dedicated hardware-backed application accelerators, there are limits to the performance improvements achieved if other aspects of the web application are not optimised. This article over at HTTPWatch describing HTTPS performance tuning makes three excellent points to help improve application performance over HTTPS, and I’d add two more:
- Use HTTP 1.1 Connection Keep-Alive – this avoids the extra round-trips needed by the browser to set up/tear down the underlying TCP connection on every request. The more recent versions of the TLS specification include optimisations for Keep-Alive (“abbreviated handshake”).
- Avoid mixed content warnings – the dreaded “This page contains both secure and non-secure items” dialog warning in older IE browsers. Achieved by aligning the protocols used for content delivery (http/https), or by serving all requests over HTTPS.
- Use HTTPS-aware Content Delivery Networks or proxies – by delivering static content over HTTPS via CDNs or caching proxies, you reduce the workload on your servers and improve page load speed.
- Compress HTTPS content – because the cost of encryption/decryption depends on the length of the data, compressing the HTTPS data stream before encryption can help to reduce the compute cost.
The Future – SSL Proxies?
The value of web acceleration appliances was highlighted by the acquisition in December 2011 of Blue Coat Systems for $1.3 billion by a private equity firm. Blue Coat offers an SSL Proxy device, allowing organisations to “power to define, enforce and audit intelligent policy controls over user/application interactions”; in other words, SSL Proxies can transparently “sniff” encrypted HTTPS traffic in real time.
In a future of “HTTPS everywhere”, it seems that HTTPS proxies will become more prevalent. Clearly, optimising the decrypt/encrypt of proxied traffic will be crucial, so dedicated hardware for SSL offload is likely to be with us for some time.