Pelicanux

Just A Few Random Words

Caching Static Content With Varnish on HTTP

Varnish is, varnish-cache says, a reverse-proxy acting as a HTTP-accelerator caching static content on server side to make its delivery faster for Web clients. Its installation is quite straightforward, being packaged on Debian as on most Linux distributions. At first, I used it to dispatch HTTP requests on my private server to my virtual servers on backend. It indeed intercepts HTTP connections and redirects them on the appropriate backend. Here is a minimal configuration to make it working:

Redirect client connection to the appropriate backend server

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
backend blog {
  .host = "hostname-of-my-awesome-blog";
  .port = "80";
}
backend default {
  .host = "hostname-of-another-awesome-blog";
  .port = "80";
}

sub vcl_recv {
  if (req.http.host == "this-awesome-blog.net") {
      set req.http.post = "hostname-of-my-awesome-blog";
      set req.backend = blog;
      remove req.http.X-Forwarded-For;
      set req.http.X-Forwarded-For = client.ip;
      if (req.request == "POST") {
          return(pipe);
      }
      return(lookup);
  } else {
      set req.backend = default;
  }
}

Requests made on url this-awesome-blog.net are redirected on host hostname-of-my-awesome-blog labeled blog. Other requests will be redirected on default backend hosted on server whose hostname is hostname-of-another-awesome-blog.

Please note that I changed the X-Forwarded-For HTTP Header for the client IP address. More on that here. On apache, logs by default take into account the IP address of the TCP packet accessing the server. To avoid logging the IP address of the proxy, here is the configuration lines to extract the IP address from the X-Forwarded-For header instead:

1
2
3
4
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" varnishcombined
LogLevel info
CustomLog   ${APACHE_LOG_DIR}/access.log varnishcombined
ErrorLog      ${APACHE_LOG_DIR}/error.log

Internals

Here is how to check whether or not a page is served by varnish (from its cache then) or by the backend server (missed) (and BTW how to add an extra HTTP-Header):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
sub vcl_deliver {
        if (obj.hits > 0) {
                set resp.http.X-Fuck = "HIT";
        } else {
                set resp.http.X-Fuck = "MIS";
        }
}

sub vcl_hit {
        if (req.request == "PURGE") {
                purge;
                error 200 "Purged";
        }
}

sub vcl_miss {
        if (req.request == "PURGE") {
                purge;
                error 404 "Not in cache";
        }
}

This is a way to add any HTTP-HEADER; note that many of them are not defined in https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html and therefore must not be implemented in HTTP clients or servers (such as X-Forwarded-For).

Caching

Some data may be cached (GET and HEAD headers, for instance) and some other can’t (typically POST requests).

Here is the session recorded by wireshark when I attempt to reach an already cached page from my blog:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Request
GET /blog/2013/11/18/caching-static-content-with-varnish-on-http/ HTTP/1.1
Host: www.pelicanux.net
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131030 Firefox/17.0 Iceweasel/17.0.10
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Referer: https://www.pelicanux.net/blog/categories/varnish/
If-Modified-Since: Sun, 29 Dec 2013 01:57:12 GMT
If-None-Match: "12fc-3204-4eea2a5494e00"
Cache-Control: max-age=0

# Response
HTTP/1.1 304 Not Modified
Server: Apache/2.2.22 (Debian)
Last-Modified: Sun, 29 Dec 2013 01:57:12 GMT
ETag: "12fc-3204-4eea2a5494e00"
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Type: text/html
Accept-Ranges: bytes
Date: Sun, 29 Dec 2013 03:17:52 GMT
X-Varnish: 1084465963
Age: 0
Via: 1.1 varnish
Connection: keep-alive

Client asks for content if modified since Sun, 29 Dec 2013 01:57:12 GTM and if ETag does not match the value “12fc-3204-4eea2a5494e00”. Varnish answers in the name of Apache; Both conditions matches hence the HTTP return code is 304. Meaning the browser is able to retrieve the current version of the page in its own cache, and the communication stops here on the HTTP level. Note that apache has not been requested (cache-hit) and otherwise, (varnish needs to requests apache to retrieve the content) this is a cache-miss.

Here are some cache related HTTP Header:

  • Expires: Date or time after which the cache has to be considered deprecated/stalled and must be retrieved. This puts a limit of the caching process. It is still possible to refesh cache before, but cache will always be refersh after.
  • Cache-Control: Always overrides Expires, and has more features