HAProxy: rate limits + TLS SNI
At work we have been using AWS elastic load balancers for some time now and found it simple to use and reliable. Unfortunately, the tradeoff for that simplicity is lack of features & control. The main issue we’re facing is the need to implement basic rate-limiting controls on our web frontend to reduce the impact of abuse. I’m actually a little surprised that AWS do not offer some basic ratelimit functionality in ELB (maybe it’s coming?). The other annoyance is having to provision a separate ELB instance for each of our SSL certificates due to lack of SNI support.
So I’m investigating the possibility of replacing our multiple ELB instances with a pair of HA proxy instances running on ec2. This post is a place to dump notes & thoughts on the process.
Goals
What I’m aiming for will have the following properties:
- high availability - both in terms of frontend (client facing) and backend (application server facing)
- frontend: auto-failover between public IP addresses assigned to HAproxy instances
- backend: auto-failover between application servers being proxied to
- rate limiting capability - layer 4 minimum, ideally upto layer 7
- TLS SNI - allow multiple SSL certificates to be served from the same IP addresss
Frontend high-availability
The plan is to use AWS route53 DNS health checks to provide a DNS based failover mechanism. This can be done one of two ways:
- active/standby configuration: route53 will return the primary IP (if healthy), otherwise will return the secondary IP
- multi-value configuration: route53 will return whichever IPs are considered healthy
Ratelimiting
If all we needed was basic rate limiting based on client IP address, then the simplest solution would be a firewall out in front running iptables and its built in rate limiting functionality. This is not suitable for our needs as we need a) more intelligent rate limiting capability and b) the ability to maintain rate limit state between multiple frontend peers. HAProxy provides a solution to both of these needs.
On a), HAPproxy allows a rate limit to be applied to almost any aspect of a TCP or HTTP transaction. On b), sharing of rate limit counters between HAProxy peers was added in HAProxy 1.6 with the caveat ‘must be used for safe reload and server failover only’. For a pair of HAProxy nodes in a low traffic scenario, I’m betting this will be ‘good enough’ for my HA needs.
Config
The following is relevant parts of haproxy.cfg
. This isn’t supposed to be any kind of ‘production’ config, just used for testing purposes. The config was built using the following resources:
peers hapeers
peer haproxy1 192.168.1.1:1024
peer haproxy2 192.168.2.1:1024
frontend https
# *.pem files read from directory '/etc/haproxy/ssl'.
# Certificate will be matched against SNI, otherwise first certificate will be used
bind *:443 ssl crt /etc/haproxy/ssl/
default_backend bk_one
tcp-request inspect-delay 5s
stick-table type ip size 200k expire 30s peers hapeers store gpc0
# backends increments frontend gpc0 (sc0_inc_gpc0) on abuse, which we're checking here
acl source_is_abuser src_get_gpc0 gt 0
# don't track abuser while it's getting redirected to rate-limit
tcp-request connection track-sc0 src if !source_is_abuser
tcp-request content accept if { req_ssl_hello_type 1 }
# tell backend client is using https
http-request set-header X-Forwarded-Proto https if { ssl_fc }
# redirect abuser to rate-limit backend until their entry expires (30s above)
use_backend rate-limit if source_is_abuser
use_backend bk_one if { ssl_fc_sni -i demo1.example.com }
use_backend bk_two if { ssl_fc_sni -i demo2.example.com }
# mostly the same as 'https' frontend, minus SSL bits
frontend http
bind *:80
default_backend bk_one
tcp-request inspect-delay 5s
stick-table type ip size 200k expire 30s peers hapeers store gpc0
# backends increments frontend gpc0 (sc0_inc_gpc0) on abuse, which we're checking here
acl source_is_abuser src_get_gpc0 gt 0
# don't track abuser while it's getting redirected to rate-limit
tcp-request connection track-sc0 src if !source_is_abuser
# redirect abuser to rate-limit backend until their entry expires (30s above)
use_backend rate-limit if source_is_abuser
use_backend bk_one if { hdr(Host) -i demo1.example.com }
use_backend bk_two if { hdr(Host) -i demo2.example.com }
backend bk_one
balance roundrobin
server app1 web.a:80 check
server app2 web.b:80 check
stick-table type ip size 200k expire 5m peers hapeers store conn_rate(30s),bytes_out_rate(60s)
tcp-request content track-sc1 src
# 10 connections is approxmately 1 page load! Increase to suit
acl conn_rate_abuse sc1_conn_rate gt 10
acl data_rate_abuse sc1_bytes_out_rate gt 20000000
# abuse is marked in the frontend so that it's shared between all sites
acl mark_as_abuser sc0_inc_gpc0 gt 0
tcp-request content reject if conn_rate_abuse mark_as_abuser
tcp-request content reject if data_rate_abuse mark_as_abuser
backend bk_two
[... same as bk_one, just using different backend servers ...]
backend rate-limit
# custom .http file displaying a 'rate limited' message
errorfile 503 /usr/share/haproxy/503-ratelimit.http