No Downtime Reality

VARNISHSTAT – VARNISH PERFORMANCE TUNING (PART II)

Nenad Merdanovic — Mon, 16 May 2016 20:31:33 GMT

Using varnishstat

In Part I of this series, we talked about basic variables that can be used to tune performance of Varnish. In this part of the series, we will discuss tools that can help us understand what to optimize and by how much.

The single most important tool for tuning Varnish performance is varnishstat. The tool comes bundled with Varnish and shows you a number of counters and rates which, when interpreted, will show you which variables you must tune further to get the most out of your Varnish installation. There are two ways to run varnishstat, with or without the ‘-1’ flag. If you run it without the flag, you will get a continuously updating display which will show you rates. If you run it with the flag, you will just get a static output over the last update interval. We will show just the important variables and explain some of them.

To better explain what each metric is and what are the most important ones, we will group them to the following groups:

Client and backend related
Worker thread related
ESI related
Storage backend related
Client and Backend Related

# varnishstat -1
client_conn        4234206     41.27   Client connections accepted  
client_drop        0       0.00    Connection dropped, no sess/wrk  
client_req        29233157    284.94  Client requests received  
cache_hit        32093887    312.82  Cache hits  
cache_hitpass        921         0.01    Cache hits for pass  
cache_miss         422706      4.12    Cache misses  
backend_conn        57122       0.56    Backend conn. success  
backend_unhealthy    0       0.00    Backend conn. not attempted  
backend_busy         0       0.00    Backend conn. too many  
backend_fail         0       0.00    Backend conn. failures

The most important variables to watch are client_drop, backend_busy, backend_unhealthy and backend_fail. The first one usually happens when you go over session_max or queue_max. The default values for those two variable are good enough and if you see client_drop increasing you should look into other things (thread workers, queue sizes, backend speed, cache hit ratio, etc.) and not blindly increase those two. The backend_busy variable indicates that you have reached the maximum amount of connections to your backend. Do not blindly increase that variable either as you may overload your backend. Last two variables, backend_unhealthy and backend_fail, indicate that either your backend was declared unhealthy by Varnish due to failing checks or it was a pure connection failure (network issue for example).
Worker Threads Related

# varnishstat -1
n_wrk               100         .       N worker threads  
n_wrk_create         2853            0.03    N worker threads created  
n_wrk_failed           0           0.00    N worker threads not created  
n_wrk_max               0           0.00    N worker threads limited  
n_wrk_lqueue           0           0.00    work request queue length  
n_wrk_queued           13614           0.13    N queued work requests  
n_wrk_drop             0           0.00    N dropped work requests

Worker thread related metrics give you insight if you have properly tuned your thread pools and their sizes. n_wrk_max metric will show you how many times you have exhausted your thread pools and threads failed to be created. Queue metric, n_wrk_lqueue, shows you the current queue length (requests waiting on worker thread to become available). Two metrics, n_wrk_queued and n_wrk_drop show you how many times a request has been queued and how many times it was dropped due exceeding queue length.
ESI Related

# varnishstat -1
esi_errors              0           0.00    ESI parse errors (unlock)  
esi_warnings            0           0.00    ESI parse warnings (unlock)

If you are using Edge Side Includes [1], esierrors and esiwarnings will give you information about the validity of your ESI syntax. If you see them increasing, inspect what is returned by the backend regarding ESI and fix any errors found.
Storage Backend Related

# varnishstat -1
n_lru_nuked            128154          .       N LRU nuked objects  
SMA.s0.c_req               1129966         10.98   Allocator requests  
SMA.s0.c_fail           128616             1.25    Allocator failures  
SMA.s0.g_bytes            4294824024      .       Bytes outstanding  
SMA.s0.g_space          143272          .       Bytes available  
SMA.Transient.c_req     24264             0.24    Allocator requests  
SMA.Transient.c_fail    0             0.00    Allocator failures  
SMA.Transient.g_bytes   27464              .       Bytes outstanding  
SMA.Transient.g_space   0              .       Bytes available

Here we have an example of the malloc storage backend and you can see that there are two types of storage, s0 and Transient. Transient storage is used by Varnish to store short-lived objects. What Varnish considers short-lived is defined by the shortlived variable – value is in seconds and it defaults to 10. Storage s0 is the storage defined with the -s flag and it is used to store all other objects. Varnishstat displays information about failed allocations in s0.c_fail and this is usually a result of reaching the cache size limit. The last variables, g_bytes and g_bytes show you the amount of used and free space. One of the most important variables found in varnishstat is the n_lru_nuked variable, which tells you how many objects got removed from cache based on the LRU algorithm. If this number is increasing fast, consider raising your cache size.
Conclusion

We have explained how varnishstat works and how it can be used to debug your Varnish installation. Hopefully this will be enough for you to start using varnishstat and understand where your Varnish might be bottlenecking or working non-optimally.

[1] ESI specification – http://www.w3.org/TR/esi-lang

VARNISH PERFORMANCE TUNING (PART I)

Nenad Merdanovic — Sat, 14 May 2016 20:40:26 GMT

Introduction

Varnish is a caching engine that is installed in front of a website to speed up delivery. It is used in front of many high traffic sites to do just that without requiring high number of backend servers.

Reading this article series will get you through basic Varnish performance tuning as well as monitoring crucial metrics. You will also learn how to use Varnish effectively with other popular software (like HAproxy and Nginx) and what the common pitfalls are. Last thing to keep in mind when building large scale infrastructures, like the ones managed by A-nine, is to never over-engineer or over-optimize a solution.

Varnish Storage Engines

Varnish has three main storage engines: malloc, file and persistent. The last one is experimental and will likely be removed from future versions of Varnish, so we will not consider it.

File based storage engine is, well, a file. mmap(2) is called to map the file to Varnish’s virtual memory. It is worth noting that file is not persistent and every restart of varnish will cause all of cache to be dropped. File backed storage should be used when you don’t have enough main memory to cache all content and the speed of your IO subsystem overcomes the speed of content generation. For example, you have a slow storage where your content resides and you want to keep the hot content in cache on smaller, but faster, IO system.

Malloc based storage works so that each object is allocated memory space using the malloc(3) system call. This is the fastest type of cache as memory access latency and throughput are a few orders of magnitude lower/faster than even the fastest SSD drives.

Long story short, use malloc backend if the content set of interest fits into memory and file if it doesn’t. Storage backends are chosen with the -s parameter. For example:
-s malloc,12G

Varnish threads

Second most important thing to tune are Varnish threads. Varnish uses a few types of threads performing various tasks, but here we will focus on the most important ones – cache-worker threads. Cache worker threads are the ones that will actually serve the HTTP request that was sent. There are three variables that we will focus on here: threadpools, threadpoolmin and threadpool_max.

The first variable, thread_pools is, as the name says, number of pools that threads will be grouped into. Each of them will consist of at least thread_pool_min and no more than thread_pool_max. This variable is set to ‘2’ by default and is recommended you leave it like that. No noticeable performance improvement was observed by upping its value.

The second variable, threadpoolmin, defines the minimum amount of threads that always need to be alive in (per thread pool), even if idle. It is wise to keep a decent amount of threads idle (we usually like to keep at least 30-40) as creating threads is an expensive operations, compared to having them run idle. This way if you have a sudden spike in traffic, you will have enough threads to handle the first hit while new ones are being spawned.

The third variable, thread_pool_max, defines the maximum amount of threads to run per thread pool. This variable obviously needs to be high enough to accommodate your traffic and needs to be adjusted per workload. Usually, you don’t want to go over 5000 threads as specified in the documentation.

Last thing to consider is a variable called threadpoolstack and we have had good experience setting it to 256k. Otherwise Varnish threads will use your system default which, depending on the operating system, can cause quite a lot of memory waste.

IO system tuning

Most important thing is to mount the Varnish working directory (where the shmlog is stored) to tmpfs.
tmpfs /usr/lib/varnish tmpfs rw,size=256M 0 0

In case you are using the file backend, make sure you set noatime on the file systems where that file is saved. This will prevent unneeded IO to update the file’s access times (which is useless for Varnish). Be sure that your partitions are aligned, that your RAID systems have decent chunk sizes (depending on your filesize) and that file systems are aligned to those chunk sizes.

Network related

Last, but not least, one must tune the network parameters of Varnish and the operating system (we will stick to Linux here). There is a lot of low level thinking to this, but we will stick to things that have most effect. The single most important thing is to properly size your listen queue size (and the later described sysctl). Listen queue size is passed to the listen(2) system call and will limit the size of not-yet-accepted connections by the application (and connections in SYNRECV state in case smaller than tcpmaxsynbacklog).

Varnish can set the listen backlog size using the the -p parameter as follows:
-p listen_depth=16383

We use 16383 as the number is first incremented by the kernel and then rounded up to the higher power of two. There is also a sysctl that controls the maximum size of the listen backlog, net.somaxconn. You should set it higher than the listen depth, but be careful as it is represented as uint_16 in the kernel, so the maximum value is 65535.

Other variables to consider are: tcp_max_syn_backlog, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_max_tw_buckets, net.ipv4.ip_local_port_range and net.ipv4.tcp_syncookies. Going into kernel tuning details is out of scope of this article.

That would be all in part I. In the next blog post we will see how to understand Varnish variables, as well as monitor and debug it. Stay tuned!