HAProxy: rate limits + TLS SNI

Sun, Jun 25, 2017

At work we have been using AWS elastic load balancers for some time now and found it simple to use and reliable. Unfortunately, the tradeoff for that simplicity is lack of features & control. The main issue we’re facing is the need to implement basic rate-limiting controls on our web frontend to reduce the impact of abuse. I’m actually a little surprised that AWS do not offer some basic ratelimit functionality in ELB (maybe it’s coming?). The other annoyance is having to provision a separate ELB instance for each of our SSL certificates due to lack of SNI support.

So I’m investigating the possibility of replacing our multiple ELB instances with a pair of HA proxy instances running on ec2. This post is a place to dump notes & thoughts on the process.

Goals

What I’m aiming for will have the following properties:

high availability - both in terms of frontend (client facing) and backend (application server facing)
frontend: auto-failover between public IP addresses assigned to HAproxy instances
backend: auto-failover between application servers being proxied to
rate limiting capability - layer 4 minimum, ideally upto layer 7
TLS SNI - allow multiple SSL certificates to be served from the same IP addresss

Frontend high-availability

The plan is to use AWS route53 DNS health checks to provide a DNS based failover mechanism. This can be done one of two ways:

active/standby configuration: route53 will return the primary IP (if healthy), otherwise will return the secondary IP
multi-value configuration: route53 will return whichever IPs are considered healthy

Ratelimiting

If all we needed was basic rate limiting based on client IP address, then the simplest solution would be a firewall out in front running iptables and its built in rate limiting functionality. This is not suitable for our needs as we need a) more intelligent rate limiting capability and b) the ability to maintain rate limit state between multiple frontend peers. HAProxy provides a solution to both of these needs.

On a), HAPproxy allows a rate limit to be applied to almost any aspect of a TCP or HTTP transaction. On b), sharing of rate limit counters between HAProxy peers was added in HAProxy 1.6 with the caveat ‘must be used for safe reload and server failover only’. For a pair of HAProxy nodes in a low traffic scenario, I’m betting this will be ‘good enough’ for my HA needs.

Config

The following is relevant parts of haproxy.cfg. This isn’t supposed to be any kind of ‘production’ config, just used for testing purposes. The config was built using the following resources:

peers hapeers
    peer haproxy1 192.168.1.1:1024
    peer haproxy2 192.168.2.1:1024

frontend https
    # *.pem files read from directory '/etc/haproxy/ssl'. 
    # Certificate will be matched against SNI, otherwise first certificate will be used
    bind *:443 ssl crt /etc/haproxy/ssl/
    default_backend bk_one

    tcp-request inspect-delay 5s

    stick-table type ip size 200k expire 30s peers hapeers store gpc0

    # backends increments frontend gpc0 (sc0_inc_gpc0) on abuse, which we're checking here
    acl source_is_abuser src_get_gpc0 gt 0

    # don't track abuser while it's getting redirected to rate-limit
    tcp-request connection track-sc0 src if !source_is_abuser
    tcp-request content accept if { req_ssl_hello_type 1 }
   
    # tell backend client is using https
    http-request set-header X-Forwarded-Proto https if { ssl_fc }

    # redirect abuser to rate-limit backend until their entry expires (30s above)
    use_backend rate-limit if source_is_abuser

    use_backend bk_one if { ssl_fc_sni -i demo1.example.com }
    use_backend bk_two if { ssl_fc_sni -i demo2.example.com }


# mostly the same as 'https' frontend, minus SSL bits
frontend http
    bind *:80
    default_backend             bk_one

    tcp-request inspect-delay 5s

    stick-table type ip size 200k expire 30s peers hapeers store gpc0

    # backends increments frontend gpc0 (sc0_inc_gpc0) on abuse, which we're checking here
    acl source_is_abuser src_get_gpc0 gt 0

    # don't track abuser while it's getting redirected to rate-limit
    tcp-request connection track-sc0 src if !source_is_abuser

    # redirect abuser to rate-limit backend until their entry expires (30s above)
    use_backend rate-limit if source_is_abuser

    use_backend bk_one if { hdr(Host) -i demo1.example.com }
    use_backend bk_two if { hdr(Host) -i demo2.example.com }

backend bk_one
    balance     roundrobin
    server  app1 web.a:80 check
    server  app2 web.b:80 check

    stick-table type ip   size 200k   expire 5m  peers hapeers  store conn_rate(30s),bytes_out_rate(60s)

    tcp-request content  track-sc1 src
    # 10 connections is approxmately 1 page load! Increase to suit
    acl conn_rate_abuse  sc1_conn_rate gt 10
    acl data_rate_abuse  sc1_bytes_out_rate  gt 20000000

    # abuse is marked in the frontend so that it's shared between all sites
    acl mark_as_abuser   sc0_inc_gpc0 gt 0
    tcp-request content  reject if conn_rate_abuse mark_as_abuser
    tcp-request content  reject if data_rate_abuse mark_as_abuser

backend bk_two

    [... same as bk_one, just using different backend servers ...]

backend rate-limit
    # custom .http file displaying a 'rate limited' message
    errorfile 503 /usr/share/haproxy/503-ratelimit.http