Changes between Initial Version and Version 1 of syn-flood-defense-narrative


Ignore:
Timestamp:
Aug 20, 2013, 3:08:32 AM (8 years ago)
Author:
Ross
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • syn-flood-defense-narrative

    v1 v1  
     1= Beating a Syn Flood Attack - Narrative =
     2
     3On Wed. Aug 8th, 2013, [http://saharareporters.com Sahara Reporters] was hit by a massive ddos attack.  This page is a narrative account and how to for dealing with such an attack.  I will attempt to be as generic as possible to help others dealing with such a problem, but some things will also be May First/People Link specific.  Also, having never had to deal with attack of this sort, I cannon confirm that the practices described here are best practices.  However, after nearly four days, many attempts, a bunch of [https://startpage.com/ Start Page] searches, and three highly competent sys admins, we finally worked out a solution for this multi-pronged attack.
     4
     5== Prong One POST past Varnish ==
     6
     7Sahara Reporters uses [https://www.varnish-cache.org/ varnish] as its primary mode of proxy caching.  A well tested and effective caching proxy, varnish has been extremely effective for serving a fairly high traffic site for a number of years.  We have faced a few attacks in the past and weathered them fairly well due to varnish's flexibility and ability to stand up to increased traffic.
     8
     9The first problem encountered during this attack rendered the primary server (not the caching servers) virtually inoperable.  [https://www.apache.org/ Apache] serves the back-end content and using an ingenious, if obvious route around varnish, the attackers were able to force a dramatic increase in the number of requests made to the apache server.  Their simple method was to make POST requests to the site home page.
     10
     11Looking at top on the apache server, we were able to determine the significance of the issue.  Normally this server has some 10-15 apache2 processes running, but top showed a list beyond counting.  ps gave better metrics with the command:
     12
     13{{{
     14ps -eFH | grep apache2
     15}}}
     16
     17we were able to determine the total number of processes running, well over 100, causing the server to become unresponsive to almost any activity.
     18
     19Under this particular load, the varnish servers showed no negative consequences.  In fact, other than the significant load on the apache server everything else seemed quite normal.
     20
     21=== The Diagnostic Revelation ===
     22
     23The initial confusion arose from the fact that it seemed as if varnish wasn't doing it's job, because too many requests were being passed through to the apache server.  I searched for caching failures on each of the varnish servers, but nothing seemed obvious.  Hit rates were normal around 85-95% and there seemed to be no load problems on the varnish servers at all.  The only real symptoms seemed to be the struggling apache server and eventually a clear spike in traffic.  We monitor our bandwidth with [https://members.mayfirst.org/cacti/ catic], support team members can get access to this through keyringer.  You will be able to see massive traffic spikes in XO on both pianeta and avocet during this period.
     24
     25Since our varnish servers are distributed, these spikes reflected traffic going directly to the apache server.  After talking with Sahara Reporters and confirming that they did not have any reason to expect increased traffic, we determined this to be a legitimate problem with the relationship between our varnish servers and the apache server.  Making sure to do the easiest and most obvious first, I restarted apache and varnish on all off the servers.
     26
     27{{{
     28service apache2 restart
     29}}}
     30
     31{{{
     32service varnish restart
     33}}}
     34
     35No improvement at all.  Frustrating but probably to be expected.  Next I checked the apache logs to investigate any unwanted traffic.
     36
     37{{{
     38tail -f /var/log/apache.log | grep -v -E 'VARNISH_IP_1|VARNISH_IP_2'
     39}}}
     40
     41'''VARNISH_IP_1 etc should be the actual ip addresses, if you need to run this command.'''
     42
     43This offered an overview of traffic patterns not passing through the varnish servers.  Unfortunately, it did not show any meaningful traffic, certainly nothing that would have caused an overloaded server.  If you're using varnish as a proxy, you would not want any connections from your non-varnish servers.  In a way this was good news, because it demonstrated that the problem had to do with varnish servers passing to many requests to the apache server, but why?
     44
     45Well, just to ease traffic to apache, the first step we took was to remove a number of varnish servers.  Since the problem was varnish, taking away any given server would reduce the number of calls to apache.  So we took out a third of our varnish servers to no avail.  The number of requests were simply too high to handle.
     46
     47The next step we took was to try to determine if any IP addresses might be overloading the varnish servers.  For this we used varnish top:
     48
     49{{{
     50varnishtop -i TxHeader -I '^X-Forwarded-For:'
     51}}}
     52
     53which gave output something like this:
     54{{{
     55list length 19                                                                                                                                                                                                                                                         bouazizi
     56
     57    39.91 TxHeader       X-Forwarded-For: 109.205.248.192
     58    27.94 TxHeader       X-Forwarded-For: 37.53.252.79
     59    22.93 TxHeader       X-Forwarded-For: 130.255.251.114
     60     9.97 TxHeader       X-Forwarded-For: 87.109.30.45
     61     4.98 TxHeader       X-Forwarded-For: 196.46.246.50, 217.212.230.234
     62     3.99 TxHeader       X-Forwarded-For: 151.245.10.171
     63     2.99 TxHeader       X-Forwarded-For: 49.231.103.138
     64     2.00 TxHeader       X-Forwarded-For: 93.186.23.81
     65     2.00 TxHeader       X-Forwarded-For: 192.168.88.6, 41.41.244.13
     66     2.00 TxHeader       X-Forwarded-For: 66.249.73.136
     67     1.99 TxHeader       X-Forwarded-For: unknown, 93.186.22.240
     68     1.99 TxHeader       X-Forwarded-For: unknown, 93.186.22.241
     69     1.00 TxHeader       X-Forwarded-For: 41.190.5.47
     70     1.00 TxHeader       X-Forwarded-For: 66.249.73.224
     71     1.00 TxHeader       X-Forwarded-For: 192.168.102.96, 46.65.52.130
     72     1.00 TxHeader       X-Forwarded-For: 93.186.22.241, 80.239.243.129
     73     1.00 TxHeader       X-Forwarded-For: 151.96.3.241
     74     1.00 TxHeader       X-Forwarded-For: 93.186.31.81
     75     1.00 TxHeader       X-Forwarded-For: 66.249.73.240
     76}}}
     77
     78Here you see three IP addresses with a significantly higher hit rate than any others.  During the actual attack, the list of rapidly requesting IP addresses was much greater.  At the very least, this offers a clue that the problem is likely an attack.  Such out of proportion numbers means a likely attack.  The next discovery was an "Ah Ha!" moment.  Using the command varnishncsa resulted in understanding the root of the problem:
     79
     80This command:
     81
     82{{{
     83varnishncsa
     84}}}
     85
     86resulted in output something like this:
     87
     88{{{
     89139.194.226.35 - - [18/Aug/2013:03:00:09 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0"
     90139.194.226.35 - - [18/Aug/2013:03:00:09 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.420014; .NET CLR 3.5.420014; .NET CLR 3.0.420014"
     91139.194.226.35 - - [18/Aug/2013:03:00:10 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 5.1; WOW64; U; Edition Grenada Local; ru) Presto/2.10.289 Version/12.07"
     92139.194.226.35 - - [18/Aug/2013:03:00:10 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 5.1; U; Edition India Local; ru) Presto/2.10.289 Version/9.08"
     93139.194.226.35 - - [18/Aug/2013:03:00:10 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 5.1; U; Edition Germany Local; ru) Presto/2.10.289 Version/5.00"
     94139.194.226.35 - - [18/Aug/2013:03:00:10 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SLCC2; .NET CLR 2.0.045312; .NET CLR 3.5.045312; .NET CLR 3.0.045312"
     95139.194.226.35 - - [18/Aug/2013:03:00:10 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/5.0 (Windows NT 5.1; rv:9.0) Gecko/20100101 Firefox/9.0"
     96139.194.226.35 - - [18/Aug/2013:03:00:10 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 6.1; WOW64; U; Edition Russia Local; ru) Presto/2.10.289 Version/6.04"
     97139.194.226.35 - - [18/Aug/2013:03:00:10 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/5.0 (Windows NT 5.1; WOW64; rv:10.0) Gecko/20100101 Firefox/10.0"
     98139.194.226.35 - - [18/Aug/2013:03:00:11 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SLCC2; .NET CLR 2.0.702355; .NET CLR 3.5.702355; .NET CLR 3.0.702355"
     99139.194.226.35 - - [18/Aug/2013:03:00:11 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.335818; .NET CLR 3.5.335818; .NET CLR 3.0.335818"
     100139.194.226.35 - - [18/Aug/2013:03:00:11 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.743546; .NET CLR 3.5.743546; .NET CLR 3.0.743546"
     101139.194.226.35 - - [18/Aug/2013:03:00:11 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.342248; .NET CLR 3.5.342248; .NET CLR 3.0.342248"
     102139.194.226.35 - - [18/Aug/2013:03:00:11 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SLCC2; .NET CLR 2.0.863776; .NET CLR 3.5.863776; .NET CLR 3.0.863776"
     103139.194.226.35 - - [18/Aug/2013:03:00:11 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.811412; .NET CLR 3.5.811412; .NET CLR 3.0.811412"
     104139.194.226.35 - - [18/Aug/2013:03:00:12 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SLCC2; .NET CLR 2.0.045312; .NET CLR 3.5.045312; .NET CLR 3.0.045312"
     105139.194.226.35 - - [18/Aug/2013:03:00:12 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 6.1; WOW64; U; Edition Russia Local; ru) Presto/2.10.289 Version/6.04"
     106139.194.226.35 - - [18/Aug/2013:03:00:12 -0400] "POST http://saharareporters.com/ HTTP/1.0" 200 837 "http://saharareporters.com/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0"
     107}}}
     108
     109
     110
     111The above output has been filtered to remove GET requests; however, POST is the telling part of the equation.  To really determine if you're experiencing this problem the appropriate command would be:
     112
     113{{{
     114varnishncsa | grep POST
     115}}}
     116
     117Examining the output above, you can see that the originating IP address tried to POST to the homepage of the site.  Because of the nature of this site, there would be no normal user needing to issue POST command to the homepage.  Still, when varnish saw the POST request, it said, "Oh, it's a POST.  I don't deal with those!" and varnish asked apache to take over.
     118
     119So for every POST request apache got a request as well.  Technically, varnish was doing it's job.  The solution?  Change varnish's job.  The first step, just to get the site running again was to turn off all post requests going to the site.  Borrowing from [https://linax.wordpress.com/2011/01/27/block-post-method-with-varnish-for-invalid-urls/ this site], we decided to simply block every POST request first.  So we added this directive to sub vlc_recv to our varnish configuration:
     120
     121{{{
     122if ( req.request == "POST" ) {
     123      error 403 ": Requested Method is not supported by this server.";
     124}
     125}}}
     126
     127Et Voila!!!  Once all the varnish servers had this directive up, the site once again started loading and apache calmed down to normal levels.  Whew, one problem solved.  Next, we added the acceptable post pages, so site functionality could continue as normal.  Here's the final directive we used:
     128
     129{{{
     130if ( req.request == "POST" ) {
     131  if ( req.url ~ "/user"
     132    || req.url ~ "/node/add"
     133    || req.url ~ "edit"
     134    || req.url ~ "comment"
     135    || req.url ~ "delete" ) {
     136       return (pass);
     137  } else {
     138      error 403 ": Requested Method is not supported by this server.";
     139  }
     140}
     141}}}
     142
     143Now varnish would not pass any post requests to the homepage, and clog up the works of apache.  Apache was happy to go back to it's old job, varnish was happy to have a new job, and I was happy to have done my job.  The only people who weren't happy were the attackers!!!
     144
     145=== The Diagnostic Duh! ===
     146
     147All seemed well for the better part of the day, after taking these steps.  Unfortunately, by the end of the evening, the attackers made quick, though obvious adjustments to their approach.  Rather than targeting the home page, these attackers decided to revamp their methodology and began running POST requests to pages that did not throw 403 errors at them.  The apache server once again bogged down and we had to begin approaching the problem in a more targeted manner.
     148
     149varnishnsca output looked more like this:
     150
     151{{{
     15239.52.217.81 - - [18/Aug/2013:03:51:40 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.667160; .NET CLR 3.5.667160; .NET CLR 3.0.667160"
     15339.52.217.81 - - [18/Aug/2013:03:51:40 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 6.1; WOW64; U; Edition United Kingdom Local; ru) Presto/2.10.289 Version/10.05"
     15439.52.217.81 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.612601; .NET CLR 3.5.612601; .NET CLR 3.0.612601"
     15542.118.204.24 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0"
     156173.245.221.81 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 5.1; U; Edition Grenada Local; ru) Presto/2.10.289 Version/5.03"
     15739.52.217.81 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.305161; .NET CLR 3.5.305161; .NET CLR 3.0.305161"
     15842.118.204.24 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SLCC2; .NET CLR 2.0.998117; .NET CLR 3.5.998117; .NET CLR 3.0.998117"
     15939.52.217.81 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 6.1; U; Edition Mongolia Local; ru) Presto/2.10.289 Version/6.04"
     16042.118.204.24 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.808502; .NET CLR 3.5.808502; .NET CLR 3.0.808502"
     16142.118.204.24 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 5.1; U; Edition Russia Local; ru) Presto/2.10.289 Version/12.08"
     16242.118.204.24 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 6.1; WOW64; U; Edition Bangladesh Local; ru) Presto/2.10.289 Version/10.07"
     16339.52.217.81 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.247061; .NET CLR 3.5.247061; .NET CLR 3.0.247061"
     16442.118.204.24 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/5.0 (Windows NT 5.1; rv:10.0) Gecko/20100101 Firefox/10.0"
     16539.52.217.81 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.452771; .NET CLR 3.5.452771; .NET CLR 3.0.452771"
     16639.52.217.81 - - [18/Aug/2013:03:51:41 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.667160; .NET CLR 3.5.667160; .NET CLR 3.0.667160"
     16742.118.204.24 - - [18/Aug/2013:03:51:42 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.269189; .NET CLR 3.5.269189; .NET CLR 3.0.269189"
     168173.245.221.81 - - [18/Aug/2013:03:51:42 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Opera/9.80 (Windows NT 6.1; WOW64; U; Edition Iran Local; ru) Presto/2.10.289 Version/7.02"
     16939.52.217.81 - - [18/Aug/2013:03:51:42 -0400] "POST http://saharareporters.com/user/login HTTP/1.1" 500 837 "http://saharareporters.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.456529; .NET CLR 3.5.456529; .NET CLR 3.0.456529"
     170}}}
     171
     172Since this is a drupal site, attacking user/login makes sense because users must send a POST request in order to login.  The choice becomes either don't let users login or allow attackers to make these POST requests.
     173
     174Our first approach was to block POST requests only to the "/user/login" path, by changing the value in the varnish directive from `if ( req.url ~ "/user"` to `if ( req.url == "/user"` and mistakenly believing that this would solve the problem.  It only took a few hours to discover such a solution would ultimately end up with either all POST pages being blocked as the attackers continued to seek alternative paths.
     175
     176==== Blocking IP addresses ====
     177
     178Realizing that varnish might not be able to offer a complete solution to the problem, we began looking for other alternatives and reluctantly decided to begin blocking IP addresses.  This was not an easy decision, since blocking IP addresses is essentially what a ddos attack does.  Such a step means potentially keeping legitimate traffic from reaching the site as well, not what we want, but given the circumstances such a process seemed imperative.
     179
     180Rather than being indiscriminate, we chose to only block those IP addresses sending POST requests from countries most likely to be the source of a botnet and least likely to speak the primary language of the site and also those with the largest number of requests.  This resulted in a list of countries:
     181
     182Russia Taiwan China Vietnam Hungary Iran Romania Czech Republic Belarus
     183
     184These countries seemed to be the largest offenders.  We targeted both POST and GET requests from these countries by running the following two scripts:
     185
     186{{{
     187#!/bin/bash
     188
     189while :;
     190do a=$(varnishncsa | grep "POST http://saharareporters.com/user/login/ HTTP/1.1" -m 1 |
     191        grep -o -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'); echo "$a";
     192    b=$(whois "$a" | grep -i -m 1 country | grep -E 'UA|TW|HU|RU|VN|vn|IR');
     193    # b=$(whois "$a" | grep -i -m 1 country);
     194    if [ -n "$b" ];
     195    then
     196        echo "$a";
     197        mf-ip-ban-address  "$a";
     198    fi
     199done
     200}}}
     201
     202The above script blocks POST requests coming from selected countries.  Ultimately, this seemed to produce fewer results than needed, so we switched to blocking GET requests from countries as well.
     203
     204{{{
     205#!/bin/bash
     206
     207while :;
     208do a=$(varnishncsa | grep "GET http://saharareporters.com/ HTTP/1.0" -m 1 |
     209        grep -o -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}');
     210    c=$(whois "$a" | grep -i -m 1 country);
     211    b=$(whois "$a" | grep -i -m 1 country | grep -E 'UA|TW|HU|RU|VN|vn|IR|RO|CZ|BY');
     212    # b=$(whois "$a" | grep -i -m 1 country);
     213    echo "$a -- $c";
     214    if [ -n "$b" ];
     215    then
     216        echo "Banned -- $a";
     217        /usr/local/sbin/mf-ip-ban-address "$a";
     218    fi
     219done
     220}}}
     221
     222This script finds the IP address, checks to make sure it's from one of the designated countries and then calls mf-ip-ban-address to ban them.  `mf-ip-ban-address` looks like this (we'll later change this script):
     223
     224{{{
     225#!/bin/bash
     226if [ ! $# == 1 ]; then
     227        echo    "You did not specify an IP address to ban
     228USAGE: $0 ip_address_to_ban"
     229        exit
     230fi
     231IP=$1
     232
     233IPTABLES=/sbin/iptables
     234$IPTABLES -A INPUT -s $IP -j LOG --log-ip-options --log-tcp-options --log-level debug --log-prefix=Banned:
     235$IPTABLES -A INPUT -s $IP -j DROP
     236}}}
     237
     238Using this method, we managed to keep the offending IP addresses at bay and began reducing the number of requests passing through varnish to the apache server.  Whew!!!
     239
     240=== Not So Fast SYN-ner ===
     241
     242Banning by country calmed things down for the better part of a day, but by the early morning on Friday, all bets were off.  The attackers had switched their approach yet again.  All of what we had done remained in place, and the apache server, in fact, all the servers seemed to be chugging along just fine.  Meanwhile the site would not load at all :-( .
     243
     244[http://www.faqs.org/docs/linux_network/x-087-2-iface.netstat.html netstat] became our tool of choice.  First we relied on
     245
     246{{{ netstat -net | wc -l }}}
     247
     248just to find out how many connections were active.  Whoa!!! Looked like tens of thousands, up to 200,000 at certain times.  That's a lot of concurrent connections and certainly more than we could handle or explain.  varnish continued doing it's job, blocking POST requests.  [http://www.netfilter.org/projects/iptables/index.html iptables] continued to block well over a thousand IP addresses.  For all intents and purposes, everything seemed to be just fine, but perusing `netstat -net` showed a viciously high number of SYN_RECV requests (thanks to [http://workingdirectory.net/ jamie] for noticing this).
     249
     250It appears that these determined attackers decided to switch tactics and use a [https://en.wikipedia.org/wiki/SYN_flood SYN flood] attack.  Having not had to deal with this particular type of attack, the effects seem rather perplexing, may especially behind varnish.  Everything looked like it was functioning correctly, but the site simply wouldn't load.  Restarting varnish resulted in the ability to load some pages for a short period of time, and then just an infinite stall.
     251
     252At MF/PL we have a super sweet script written by [http://cmrg.fifthhorseman.net/wiki/dkg dkg] to check for open syns, [https://support.mayfirst.org/browser/puppet/modules/mayfirst/files/ip-utils/sbin/mf-ip-list-open-syns mf-ip-list-open-syns].  Apparently, the script was written back in 2003 to watch for potential attacks to the counter-convention website during [https://en.wikipedia.org/wiki/2004_Republican_National_Convention_protest_activity The Republican National Convention in 2004].
     253
     254Running `mf-ip-list-open-syns`, showed pages and pages of IP addresses.  A good indication that the SYN flood attack was the probable attack we faced.
     255
     256In our search for answers, we discovered a nifty one line bash command that gave us a pretty clear sense of what we faced:
     257
     258{{{
     259netstat -ant | grep 80 | awk '{print $6}' | sort | uniq -c | sort -n
     260}}}
     261
     262It produced output something like this (numbers modified for example):
     263
     264{{{
     265      1 LISTEN
     266      2 CLOSING
     267     30 FIN_WAIT2
     268     39 FIN_WAIT1
     269     42 LAST_ACK
     270    166 SYN_RECV
     271    226 ESTABLISHED
     272    634 TIME_WAIT
     273    34030 CLOSE_WAIT
     274}}}
     275
     276We saw a huge number of CLOSE_WAIT connections.  The CLOSE_WAIT connection, we would learn means that the process handling the connection has not yet been able to close the connection.  We used the following command to determine which process was holding open the connection:
     277
     278{{{
     279~#  netstat -antp | grep CLOSE_WAIT | head -10
     280tcp        0  13140 216.66.23.43:80         109.160.88.5:3912       CLOSE_WAIT  29613/varnishd 
     281tcp        0  12708 216.66.23.43:80         46.225.41.180:61010     CLOSE_WAIT  29613/varnishd 
     282tcp        0  13140 216.66.23.43:80         109.160.88.5:3902       CLOSE_WAIT  29613/varnishd 
     283tcp        0  13140 216.66.23.43:80         109.160.88.5:3883       CLOSE_WAIT  29613/varnishd 
     284tcp        0  13140 216.66.23.43:80         109.160.88.5:3996       CLOSE_WAIT  29613/varnishd 
     285tcp        1  12708 216.66.23.43:80         46.225.41.180:61131     CLOSE_WAIT  29613/varnishd 
     286tcp        0  12240 216.66.23.43:80         171.4.214.125:23126     CLOSE_WAIT  29613/varnishd 
     287tcp        0  12708 216.66.23.43:80         46.225.41.180:61129     CLOSE_WAIT  29613/varnishd 
     288tcp        0  12780 216.66.23.43:80         41.43.168.155:56923     CLOSE_WAIT  29613/varnishd 
     289tcp        0  13140 216.66.23.43:80         109.160.88.5:3908       CLOSE_WAIT  29613/varnishd 
     2900
     291}}}
     292
     293As might be expected, varnish was responsible for all of the CLOSE_WAIT connections.  This seemed like progress, all we needed to do was figure out how to end all of the CLOSE_WAIT connections.  Easy, right?  I wish...
     294
     295Perhaps I'm jumping ahead a little bit, because as soon as we discovered this was a SYN flood attack, we begin researching how to resist this attack.  In almost ever case we found reference to two things:
     296
     2970. Turing on synflood_cookies.
     2981. A set of iptable rules to mitigate against SYN flood attacks.
     299
     300==== Turn on synflood cookies ====
     301
     302This is a fairly standard practice and can be done with a live system by modifying /proc/sys/net/ipv4/tcp_syncookies .  Check to see the current status with:
     303
     304{{{
     305cat /proc/sys/net/ipv4/tcp_syncookies
     306}}}
     307
     308If the result is "0", you can turn on syncookies with:
     309
     310{{{
     311echo 1 > /proc/sys/net/ipv4/tcp_syncookies
     312}}}
     313
     314We did not have tcp_syncookies enabled, so it seemed like a great and easy solution.  However, there were no noticeable improvements produced by enabling this value.  Even after full reboot, this value did not seem to reduce the syn flood as it was implemented.  This is not to say it isn't important, and we will leave it enabled.
     315
     316==== iptables resistance ====
     317
     318From a number of different sources, we found a similar set of iptables rules to implement as general resistance to SYN flood attacks.  Below is one set, though there we would end up implementing others as well.  This set creates a chain `syn-flood` that limits the amount of time taken by any connection or something along those lines.
     319
     320{{{
     321iptables -N syn-flood
     322iptables -A syn-flood -m limit --limit 10/second --limit-burst 50 -j RETURN
     323iptables -A syn-flood -j LOG --log-prefix "SYN flood: "
     324iptables -A syn-flood -j DROP
     325}}}
     326
     327Again, this method did not seem to produce any noticeable results.  Even after shutting down varnish, making sure all connections had terminated, and then restarting varnish, the number of CLOSE_WAIT connections piled up almost instantly.  Quite frustrating to say the least.
     328
     329Meanwhile, we continued to ban IP addresses at an alarming rate and unable to tell with certainty whether or not these addresses were spoofed.  After hours of this approach, we could only periodically get varnish to serve content and then only for moments.
     330
     331Throughout this process we tried numerous additional firewall mechanisms with limited results.  These iptable rules seemed hopeful:
     332
     333{{{
     334iptables -A INPUT -p tcp --syn --dport 80 -j ACCEPT
     335iptables -A INPUT -p tcp --syn -m limit --limit 1/s --limit-burst 4 -j ACCEPT
     336iptables -A INPUT -p tcp --syn -j DROP
     337}}}
     338
     339Similar to the above chain, these rules test for SYN packets and if their are more than 4 from the same IP in a second, they get dropped.  These restrictions are more severe than the earlier 'syn-flood' chain.  Still little improvement.
     340
     341Next we dove into netfilter.  There exist numerous configuration options in /etc/sysctl.conf.  We configure these settings first:
     342{{{
     343net.ipv4.conf.all.rp_filter = 1
     344net.ipv4.conf.default.rp_filter = 1
     345}}}
     346
     347In reality, we had little knowledge of defending against these types of attacks and tried whatever we could to mitigate the problem.  Upon reflection, it seems that some of the sysctl settings will only make an impact on routers and not stand alone servers.
     348
     349One recommended setting is `net.netfilter.nf_conntrack_tcp_timeout_syn_recv=30`, which seems to reduce the amount of time a SYN request can remain open.  After trying to set this value, we discovered a that the servers were without [http://conntrack-tools.netfilter.org/ conntrack-tools].  After installing conntrack-tools `apt-get install conntrack`, after installing conntrack tools we needed to enable three modules to utilize it's capabilities.
     350
     351{{{
     352modprobe nf_conntrack
     353modprobe nf_conntrack_ipv4
     354modprobe nf_conntrack_netlink
     355}}}
     356
     357With the above modules enabled we can now run conntrack -L to see current flow states of all connections, '''e.g.'''
     358
     359{{{
     360~# conntrack -L | head -10
     361tcp      6 345409 ESTABLISHED src=199.87.167.202 dst=187.14.214.174 sport=80 dport=12703 packets=3 bytes=1893 [UNREPLIED] src=187.14.214.174 dst=199.87.167.202 sport=12703 dport=80 packets=0 bytes=0 mark=0 secmark=0 use=2
     362tcp      6 320054 ESTABLISHED src=199.87.167.202 dst=54.236.252.74 sport=80 dport=37853 packets=1 bytes=52 [UNREPLIED] src=54.236.252.74 dst=199.87.167.202 sport=37853 dport=80 packets=0 bytes=0 mark=0 secmark=0 use=2
     363tcp      6 401819 ESTABLISHED src=199.87.167.202 dst=37.8.76.60 sport=80 dport=30865 packets=1 bytes=1492 [UNREPLIED] src=37.8.76.60 dst=199.87.167.202 sport=30865 dport=80 packets=0 bytes=0 mark=0 secmark=0 use=2
     364tcp      6 289081 ESTABLISHED src=31.207.246.124 dst=199.87.167.202 sport=3007 dport=80 packets=4 bytes=184 src=199.87.167.202 dst=31.207.246.124 sport=80 dport=3007 packets=1 bytes=44 [ASSURED] mark=0 secmark=0 use=2
     365tcp      6 261068 ESTABLISHED src=175.176.150.152 dst=199.87.167.202 sport=7286 dport=80 packets=2 bytes=88 src=199.87.167.202 dst=175.176.150.152 sport=80 dport=7286 packets=1 bytes=44 [ASSURED] mark=0 secmark=0 use=2
     366tcp      6 321533 ESTABLISHED src=199.87.167.202 dst=54.236.254.18 sport=80 dport=34831 packets=1 bytes=52 [UNREPLIED] src=54.236.254.18 dst=199.87.167.202 sport=34831 dport=80 packets=0 bytes=0 mark=0 secmark=0 use=2
     367tcp      6 397643 ESTABLISHED src=199.87.167.202 dst=54.236.254.116 sport=80 dport=54920 packets=1 bytes=52 [UNREPLIED] src=54.236.254.116 dst=199.87.167.202 sport=54920 dport=80 packets=0 bytes=0 mark=0 secmark=0 use=2
     368tcp      6 345481 ESTABLISHED src=199.87.167.202 dst=189.100.29.153 sport=80 dport=12262 packets=3 bytes=1815 [UNREPLIED] src=189.100.29.153 dst=199.87.167.202 sport=12262 dport=80 packets=0 bytes=0 mark=0 secmark=0 use=2
     369tcp      6 345350 ESTABLISHED src=199.87.167.202 dst=197.160.90.202 sport=80 dport=28290 packets=1 bytes=604 [UNREPLIED] src=197.160.90.202 dst=199.87.167.202 sport=28290 dport=80 packets=0 bytes=0 mark=0 secmark=0 use=2
     370tcp      6 260548 ESTABLISHED src=175.176.150.152 dst=199.87.167.202 sport=65299 dport=80 packets=2 bytes=88 src=199.87.167.202 dst=175.176.150.152 sport=80 dport=65299 packets=1 bytes=44 [ASSURED] mark=0 secmark=0 use=2
     3710
     372}}}
     373
     374Having conntrack installed, proved a great boon for helping track what was happening on the server.  One of the advantages of conntrack is supposed to be more effectively managing flow control of ip addresses.  For better or worse, I never managed to notice a specific mechanism to utilize conntrack in this way.  However, we were able to use it as a reference point for examining different types of connections.
     375
     376This would come in handy later on, but first we found another mechanism by which to thwart the attack.  It turned out that many of the referrer addresses were bogus, looking like "stahoustoa.com".  Steve had the fabulous idea of using varnish to throw a 500 error on malformed refer headers.
     377
     378We then added this line to our varnish configuration:
     379
     380{{{
     381if (req.http.referer && req.http.referer !~ "^http") {
     382    error 500 ": Internal Server Error";
     383}
     384}}}
     385
     386To our surprise, this allowed varnish to begin serving content again, and when we went to sleep the site was again live.  By morning our hopes again turned to horror.
     387
     388In the end, it would be conntrack and iptables that did the heavy firewall lifting.  As should probably be expected, since varnish became incapable of closing outgoing connections.  The big ah ha moment came with the idea of blocking outbound connections to the offending ip addresses.  Since iptables INPUT does not negotiate SYN connections, all of our ip blocking up to this point had little effect.
     389
     390We'd mistakenly believed that dropping an IP address meant blocking an IP address, which on the whole is true.  The caveat being that iptables doesn't block the SYN part of the connection.
     391
     392Using contrack, we wrote a script that would find IP addresses with more than 20 CLOSE_WAIT connections and then block the outgoing response.  The magic single line turned out to be rather simple:
     393
     394{{{
     395iptables -A OUTPUT -d $IP -j DROP
     396}}}
     397
     398The script for using conntrack for this purpose looks like this:
     399
     400{{{
     401#!/bin/bash
     402
     403# This script finds ip addresses that are
     404# holding open multiple connections and
     405# calls mf-ip-delete-and-ban to block access
     406# from and to the ip address.
     407# It parses the conntrack logs, so conntrack
     408# is a dependancy.
     409
     410type conntrack >/dev/null 2>&1 || { echo >&2 "This script depends on conntrack but it's not installed.  Aborting."; exit 1; }
     411
     412while :
     413do
     414    for i in $(conntrack -L |
     415        grep CLOSE_WAIT | awk '{print $5}' |
     416        cut -f2 -d'=' | sort | uniq -c |
     417        sort -n | awk '{if($1>=20)print $2;}')
     418    do  /root/mf-ip-delete-and-ban "$i"
     419    done
     420    sleep 10
     421done
     422}}}
     423
     424And we modified `mf-ip-ban-address` to be `mf-ip-delete-and-ban`, which looks like this:
     425
     426{{{
     427#!/bin/bash
     428
     429# This script is adds OUTPUT blocking and iptable
     430# delete to the standard mf-ip-ban-address script.
     431# iptable -D (delete) will remove any duplicate
     432# record in iptables before creating a new one.
     433# OUTPUT blocking tells iptables to disallow outgoing
     434# connections to the ip address.  This is useful
     435# for dealing with syn flood attacks.
     436
     437if [ ! $# == 1 ]; then
     438        echo    "You did not specify an IP address to ban
     439USAGE: $0 ip_address_to_ban"
     440        exit
     441fi
     442IP=$1
     443
     444IPTABLES=/sbin/iptables
     445$IPTABLES -D OUTPUT -d  $IP -j DROP
     446$IPTABLES -A OUTPUT -d  $IP -j DROP
     447printf "banned output from -- $IP\n";
     448$IPTABLES -D INPUT -s $IP -j LOG --log-ip-options --log-tcp-options --log-level debug --log-prefix=Banned:
     449$IPTABLES -A INPUT -s $IP -j LOG --log-ip-options --log-tcp-options --log-level debug --log-prefix=Banned:
     450$IPTABLES -D INPUT -s $IP -j DROP
     451$IPTABLES -A INPUT -s $IP -j DROP
     452}}}
     453
     454Notice a couple of changes from mf-ip-ban-address.  The first and most important change was adding OUTPUT dropping.  As soon we began using this method, varnish could relinquish it's open connections to the offending IP addresses.  Since it no longer needed to wait for a FINAL_ACK.
     455
     456The second change was adding a delete line for each creation line.  While under a single offender context, blocking a single IP may not require this delete line.  Since we scripted IP blocking, one of the major effects turned out to be a huge duplicate list of iptable rules.  Adding the -D switch, meant deleting any duplicate entry before adding the current entry.
     457
     458And that's the story.  So far the site seems to be happily chugging along.
     459
     460== Other Gotchas Encountered ==
     461
     462Mistakes and oversights occurred on a few occasions during this process.  The first thing I, ross, the author of this narrative learned was:
     463
     464'''Never do `service networking restart` on a machine for which you don't have console access.'''
     465
     466The importance of this lesson continues to develop as the provider http://wgwilkins.com apparently has stopped responding to support requests.  The server from that provider continues to linger in a non-networked state.
     467
     468=== Watch your logs ===
     469
     470iptables generates excessive logging traffic to /var/log/kern.log /var/log/syslog and /var/log/debug.  When banning thousands of IP addresses and logging those bans, you may want to either dramatically increase rsyslog's rotation frequency or turn off logging to those files.  In a number of instances our /var partition filled up, adding unnecessary confusion about server behavior.  Especially in high intensity situations these additional concerns do not make life pleasant.
     471
     472=== iptables do not persist ===
     473
     474It's easy to forget in the middle of debugging something like this that iptables do not, by default, persist on a server reboot.  In order to retain your ip settings, you'll need to do:
     475
     476{{{
     477/sbin/iptables-save > /etc/iptables.up.rules
     478}}}
     479
     480before rebooting.  And then:
     481
     482{{{
     483/sbin/iptables-restore < /etc/iptables.up.rules
     484}}}
     485
     486after rebooting.  There are ways to automate this [http://www.debian-administration.org/articles/445 see the Debian Admin guide].
     487
     488=== De-duping iptables rules ===
     489
     490In case you end up with a bunch if duplicate IP address entries in your iptables.  Here's one approach for de-duping your rules.
     491
     492==== First create a duplicate IP list ====
     493
     494This long one liner builds a file of IP addresses from iptables with values greater than 1.
     495
     496{{{
     497iptables -L -n | grep DROP | grep -o -E 'all -- [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | grep -o -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | grep -v '0.0.0.0' | sort | uniq -c | sort -n | awk '$1 > 1 {print $2}' > ~/duplicate-ip-table-entries.txt
     498}}}
     499
     500==== Remove all duplicate entries and re-add them ====
     501
     502This one liner removes all IP addresses and re-adds them.
     503
     504{{{
     505for i in $(cat duplicate-ip-table-entries.txt); do for ip in $(iptables -L -n | grep DROP | grep -o -E "$i"); do echo "$ip"; iptables -D INPUT -s "$ip" -j LOG --log-ip-options --log-tcp-options --log-level debug --log-prefix=Banned:; iptables -D INPUT -s "$ip" -j DROP; iptables -D OUTPUT -d "$ip" -j DROP; done; done;
     506
     507for i in $(cat duplicate-ip-table-entries.txt); do ./mf-ip-delete-and-ban "$i"; done
     508}}}
     509
     510You'd need to modify the iptables rules to match the specific way you added them.  The above lines delete rules specified in the `mf-delete-and-ban` script listed above.
     511
     512
     513== Potentially Helpful links ==
     514 * https://www.linuxquestions.org/questions/linux-security-4/how-to-disconnect-established-connection-in-iptables-564900/#post2803287
     515 * http://pierre.linux.edu/2010/04/how-to-secure-your-webserver-against-syn-flooding-and-dos-attack/
     516 * http://linuxadministration.us/?p=23
     517 * http://www.cyberciti.biz/tips/linux-iptables-10-how-to-block-common-attack.html
     518 * http://www.cyberciti.biz/faq/linux-kernel-etcsysctl-conf-security-hardening/
     519 * http://lists.netfilter.org/pipermail/netfilter/2002-November/039855.html