Blocking SYN-Flood Attacks on macOS?

l008com · Oct 2, 2020

I've got a Mac web server that's been getting hit pretty regularly with `SYN` half-open attacks. I've looked in to various ways to combat this. I've found a lot of general information, but not very much specific information.

The "easiest" approach seems to be tweaking tcp settings in `sysctl.conf`, but I'm having a hard time finding any kind of documentation of these settings. I probably want to increase the total number of available half-open connections, I probably want to reduce the timeout for those connections too, but this is a topic that's not very well documented.

Some of the things I've read suggest you shouldn't raise the half-open connection count limit too high because it uses up more memory. But how much memory can an half open connection really consume? A few bytes? Even if it's 1KB each, my server has 24 GB. Right now the server's limit is 512 half-open connections but I feel like tens of thousands should be no sweat, unless there is some other factor I'm not aware of? Which is very possible because again, so little documentation.

Moving on, another approach is `SYN` Proxying/`SYN` Cookies. These methods remember the details of a syn request, then drop it. Then re-opens it again if the responding `ACK` ever comes. I don't really see how that is functionally different than just opening the connections directly anyway, it seems like that is how TCP should work so it is inherently resistant to these `SYN` attacks. But that's beside the point.....

So the easiest way to use `SYN` Proxying for me is to enable it in my `pf` firewall. But when I try, it doesn't seem to work properly. I did eventually read something that said synproxying in `pf` doesn't work properly on macOS. Another topic with very little documentation.

The other option is to use syncookies, enabled in `sysctl.conf`. I haven't tried this yet. Part of the reason is because both syncookies and synproxying always have disclaimers about how you shouldn't use it constantly, only when you are under attack etc. Well, ignoring the fact that once you are under attack, it is too late to enable syncookies, I also would really like some elaboration on that warning! That sounds very important. WHY would you not want to use proxying or cookies all the time? What are the downsides if I do?

The `SYN` half-open attack does not seem particularly complicated, I'm surprised there aren't easier ways to mitigate it. To be honest, I'm surprised it wasn't all but eliminated with changes to the way all TCP functions.

thebufenator · Oct 2, 2020

How many syn's are you getting when you are getting attacked?

l008com · Oct 2, 2020

512 appears to be my system's limit for half open connections. I have a script running that runs netstat when the server is unreachable and saves it to a file I can check out later when it is accessible again. There are also times when there are very few SYN_RCVD packets but many FIN_WAIT_1 or FIN_WAIT_2 packets, like a few hundred.

toast0 · Oct 7, 2020

Could you put this behind a proxy server running FreeBSD or Linux? Modern versions of those can handle syn floods just fine.

If you're getting a lot of SYN floods, syncookies is what you want. The basic idea is your server sends a syn+ack with a special sequence number, and if the client sends back an ack to that, if the ack doesn't match any current connections, the ack number and the ips and ports are hashed, and if it matches the key, it goes direct to CONNECTED. Without that, when the limit of syn_sent connections is reached, it's generally FIFO, and if you get an ACK to a connection that's no longer in the list, too bad, so sad.

Of course, if the bandwidth of the incoming syns is more than your server's incoming bandwidth, you're going to have problems, same if the syn+ack traffic you send is more than your outgoing.

Mac OS pulled the tcp stack (and a lot of other things) from FreeeBSD, but doesn't tend to get updates, so it's probably not going to be the best performing.

What kind of connection rate for legit traffic are you expecting? and what syn flood rate are you seeing?

l008com · Oct 7, 2020

toast0 said:
Could you put this behind a proxy server running FreeBSD or Linux? Modern versions of those can handle syn floods just fine.

In short, no. This is a collocated server and running a whole other server to feed this server isn't an option. Also it's macos, it's networking stack is basically the same as bsd, so the solutions to this problem on bsd are likely the same solutions that will work for macos. Aside from the apparentl bug in `pf`.

toast0 said:
If you're getting a lot of SYN floods, syncookies is what you want. The basic idea is your server sends a syn+ack with a special sequence number, and if the client sends back an ack to that, if the ack doesn't match any current connections, the ack number and the ips and ports are hashed, and if it matches the key, it goes direct to CONNECTED. Without that, when the limit of syn_sent connections is reached, it's generally FIFO, and if you get an ACK to a connection that's no longer in the list, too bad, so sad.

Yeah I've read about syncookies, but everyplace I read about them, there are warnings that you should only use them when you are under attack. But I have yet to find any docs that explain in detail what the downsides to syncookies are and why you don't want to use it all the time. And also how much "attack" is enough to enable it.

I'm also confused as to how this is functionally different than a normal connection handshake in a situation where you set your max half open connections to some really high number. How much memory does a half open tcp connection take up? It can't possibly be very much. The server has 24 GB of RAM. It should be able to handle *pulling number out of ass* hundreds of thousands of half open connections without flinching, I would think.

toast0 said:
Of course, if the bandwidth of the incoming syns is more than your server's incoming bandwidth, you're going to have problems, same if the syn+ack traffic you send is more than your outgoing.

Mac OS pulled the tcp stack (and a lot of other things) from FreeeBSD, but doesn't tend to get updates, so it's probably not going to be the best performing.

What kind of connection rate for legit traffic are you expecting? and what syn flood rate are you seeing?

The DoS seems to be coming from simply filling the max number of half-open connections, 512 in this case. The attacks come in very random bursts. Sometimes every few hours, sometimes every few days. It's been going on like this for a long time. I wouldn't imagine the bandwidth is very high. The server can handle quite a bit and it's in a data center on a pair of very fast lines.

Also, this may be out of scope but it would also be nice if i could run entirely independent TCP stacks on each ethernet port. That way the "attacks" would only take down the website they are hitting, and not the entire server including my remote access to it.

toast0 · Oct 7, 2020

l008com said:
In short, no. This is a collocated server and running a whole other server to feed this server isn't an option. Also it's macos, it's networking stack is basically the same as bsd, so the solutions to this problem on bsd are likely the same solutions that will work for macos. Aside from the apparentl bug in `pf`.

If you can't change it, it's not worth arguing about, but... the thing is Apple took the BSD stack at a point in time (early 2000s IIRC). I looked at my notes from work, and FreeBSD 11 (December 2016) has much increased syn-flood handling than earlier versions, and that's most likely never coming to Mac OS. On the FreeBSD 10 systems, we were processing ~ 100k packets per second and the system was unusable, and on 11, I was able to receive 2M pps (only sent out 600k SYN+ACKs though), and the system was otherwise fine; my notes don't say it, but I think I had run out of sources to send the SYNs from, and figured 2gbps line rate synflood was enough. Anyway --- you can't change it, we'll have to deal with it.

l008com said:
Yeah I've read about syncookies, but everyplace I read about them, there are warnings that you should only use them when you are under attack. But I have yet to find any docs that explain in detail what the downsides to syncookies are and why you don't want to use it all the time. And also how much "attack" is enough to enable it.

I'm also confused as to how this is functionally different than a normal connection handshake in a situation where you set your max half open connections to some really high number. How much memory does a half open tcp connection take up? It can't possibly be very much. The server has 24 GB of RAM. It should be able to handle *pulling number out of ass* hundreds of thousands of half open connections without flinching, I would think.

So, generally people who use syncookies run it in I would say an optimistic mode. Keep the normal table, but also run the hash to set the initial sequence number (the specific number doesn't really matter, it just needs to be hard to guess to prevent spoofing). If the kernel is smart (which BSD wasn't always!), syncookies on unknown packets are only checked if the normal table has overflowed in the recent past. Anyway, suggestions down below.

l008com said:
The DoS seems to be coming from simply filling the max number of half-open connections, 512 in this case. The attacks come in very random bursts. Sometimes every few hours, sometimes every few days. It's been going on like this for a long time. I wouldn't imagine the bandwidth is very high. The server can handle quite a bit and it's in a data center on a pair of very fast lines.

l008com said:
Also, this may be out of scope but it would also be nice if i could run entirely independent TCP stacks on each ethernet port. That way the "attacks" would only take down the website they are hitting, and not the entire server including my remote access to it.

I'm not sure if this is possible on Mac OS, I think it's possible on other OSes, but it's not always clear it would help; some of the TCP stack issues I've run into end up with loops in the kernel that might not be preemptable.

OK, here's some suggestions. I'm looking at the FreeBSD 12 sysctls, cause that's the closest I've got to Mac OS. If some of these don't show up, do sysctl -a net and see if you can find something similar looking or send me that output and we can figure it out together. For all of these run sysctl foo to view, and sysctl foo=X to set

net.inet.ip.maxfrags this should be set low, but tends to be set to scale with memory size, I would set it to 64; this isn't related to syn floods, but if you also get DDoS with fragmented packets, it'll help with that
net.inet.tcp.mslyou can turn this down from 30 seconds to 10 seconds, and it helps clean out some TCP states faster
net.inet.tcp.syncache.rexmtlimit I suggest turning this down from 3 to 1 or 0; it's how many times to resend SYN+ACK if you don't get an ACK; but if client didn't get your SYN+ACK, they'll resend the SYN. Reducing this reduces the amount of work your server does for spoofed SYNs
net.inet.tcp.syncache.cachelimit this is the limit you're looking for. On my system, it defaulted to 16k, but maybe it's 512 on yours? You can set it pretty high without much penalty, as long as you set the rexmtlimit from the previous line low.
net.inet.tcp.syncache.count --- measure this in your netstat script. BTW, I hope you're using -n on your script to prevent DNS lookups; netstat on BSDs locks the TCP structures while it's running, so you don't want that to dawdle.
net.inet.tcp.syncookies this should be 1 to enable syncookies in addition to syncache (I don't think you need to fiddle with pf)

Also check if kern.maxfiles and kern.ipc.maxsockets looks appropriate. I seem to recall Mac OS having a pretty anemic default number of open files, but it's been a while since I ran stuff on it.

One other thing that can be pretty helpful... I don't know if you can do it with Mac, or pf, but on FreeBSD with ipfw, you can set rules to log to a virtual interface at a percentage, then run tcpdump on that interface to log say 1% or less of the packets; if you run that continuously (tcpdump -w XXXX -G XXX to rotate files every so often), when you get DDoSed, you can look at what was going on, and try to piece it together. If your traffic is low enough, you could just dump all the traffic, but that would have been a lot for me. I would also use -s 64 on the tcpdump to hopefully get all the tcp options, but not a lot of tcp payload (you generally don't need it, and probably don't want it logged).

l008com · Oct 8, 2020

Thanks, that reply was extremely useful! I have a bunch of followup questions before I start making changes to my config, but I don't have time to organize it all right now. I'll be back...

l008com · Oct 14, 2020

It is a real shame that that -d flag does not seem to work for any key at all.

net.inet.tcp.blackhole exists, is set to 0. Seems wise to set it to 1. This wasn't in your post but I came across it and it seems helpful.

kern.maxfiles exists and is set to 450,000. My server is in the secret-ish "high performance mode" that likely bumped this number up by a lot. My desktop computer right now has that value set to 49k for example.

net.inet.tcp.msl is currently set to 15000. I think it's safe to assume that's ms, so 15s. Should I potentially go even lower than 10s? I'm not 100% clear on what this is the exact timeout for.

kern.ipc.maxsockets does not exist. I do have the following similar sounding keys with their listed values:
kern.ipc.maxsockbuf 8388608
kern.ipc.somaxconn 1536

net.inet.ip.maxfrags does not exist, but I do have these similar sounding keys:
net.inet.ip.maxfragpackets 3072
net.inet.ip.fragpackets 0
net.inet.ip.maxfragsperpacket 128

net.inet.tcp.syncache.rexmtlimit does not exist, but I do have this, is it the same?
net.inet.tcp.rexmt_thresh 3

net.inet.tcp.syncache.cachelimit, net.inet.tcp.syncache.count, net.inet.tcp.syncookies do not exist, and I can't find anything similar.

Apparently El Capitan doesn't support syncache or syncookies by default. That's not good news for me. But it seems impossible that there isn't a setting in here somewhere that lets me set the half-open connection limit. I searched the output of sysctl -a net for 512 but none of the keys 'sound' like they are the right value. But how can I tell? There's no -d for any key. Googling these things brings you almost exclusively to dead ends. It's like random shots in the dark. Here are all of my ipv4 keys that are currently set to 512:
net.inet.ip.mcast.maxgrpsrc
net.inet.ip.dummynet.red_avg_pkt_size
net.inet.tcp.mssdflt
net.inet.tcp.rcvsspktcnt

For shits n' giggles, I'm going to attach the entire list of sysctl -a net keys. Pasting a list of ~500 keys to a forum isn't the best way to go about this, but theres so little documentation, I'm not sure what else to do?

sysctl -a net said:
net.local.stream.sendspace: 8192
net.local.stream.recvspace: 8192
net.local.stream.tracemdns: 0
net.local.dgram.maxdgram: 2048
net.local.dgram.recvspace: 4096
net.local.inflight: 0
net.inet.ip.portrange.lowfirst: 1023
net.inet.ip.portrange.lowlast: 600
net.inet.ip.portrange.first: 49152
net.inet.ip.portrange.last: 65535
net.inet.ip.portrange.hifirst: 49152
net.inet.ip.portrange.hilast: 65535
net.inet.ip.forwarding: 1
net.inet.ip.redirect: 1
net.inet.ip.ttl: 64
net.inet.ip.rtexpire: 3600
net.inet.ip.rtminexpire: 10
net.inet.ip.rtmaxcache: 128
net.inet.ip.sourceroute: 0
net.inet.ip.accept_sourceroute: 0
net.inet.ip.gifttl: 30
net.inet.ip.subnets_are_local: 0
net.inet.ip.mcast.maxgrpsrc: 512
net.inet.ip.mcast.maxsocksrc: 128
net.inet.ip.mcast.loop: 1
net.inet.ip.dummynet.hash_size: 64
net.inet.ip.dummynet.curr_time: 0
net.inet.ip.dummynet.ready_heap: 0
net.inet.ip.dummynet.extract_heap: 0
net.inet.ip.dummynet.searches: 0
net.inet.ip.dummynet.search_steps: 0
net.inet.ip.dummynet.expire: 1
net.inet.ip.dummynet.max_chain_len: 16
net.inet.ip.dummynet.red_lookup_depth: 256
net.inet.ip.dummynet.red_avg_pkt_size: 512
net.inet.ip.dummynet.red_max_pkt_size: 1500
net.inet.ip.dummynet.debug: 0
net.inet.ip.fw.enable: 1
net.inet.ip.fw.autoinc_step: 100
net.inet.ip.fw.one_pass: 0
net.inet.ip.fw.debug: 0
net.inet.ip.fw.verbose: 0
net.inet.ip.fw.verbose_limit: 0
net.inet.ip.fw.dyn_buckets: 256
net.inet.ip.fw.curr_dyn_buckets: 256
net.inet.ip.fw.dyn_count: 0
net.inet.ip.fw.dyn_max: 4096
net.inet.ip.fw.static_count: 1
net.inet.ip.fw.dyn_ack_lifetime: 300
net.inet.ip.fw.dyn_syn_lifetime: 20
net.inet.ip.fw.dyn_fin_lifetime: 1
net.inet.ip.fw.dyn_rst_lifetime: 1
net.inet.ip.fw.dyn_udp_lifetime: 10
net.inet.ip.fw.dyn_short_lifetime: 5
net.inet.ip.fw.dyn_keepalive: 1
net.inet.ip.random_id_statistics: 0
net.inet.ip.random_id_collisions: 0
net.inet.ip.random_id_total: 0
net.inet.ip.sendsourcequench: 0
net.inet.ip.maxfragpackets: 3072
net.inet.ip.fragpackets: 0
net.inet.ip.maxfragsperpacket: 128
net.inet.ip.scopedroute: 1
net.inet.ip.adj_clear_hwcksum: 0
net.inet.ip.check_interface: 0
net.inet.ip.rx_chaining: 1
net.inet.ip.rx_chainsz: 6
net.inet.ip.input_perf: 0
net.inet.ip.input_perf_bins: 0
net.inet.ip.linklocal.in.allowbadttl: 1
net.inet.ip.random_id: 1
net.inet.ip.maxchainsent: 0
net.inet.ip.select_srcif_debug: 0
net.inet.ip.output_perf: 0
net.inet.ip.output_perf_bins: 0
net.inet.icmp.maskrepl: 0
net.inet.icmp.icmplim: 250
net.inet.icmp.timestamp: 0
net.inet.icmp.drop_redirect: 1
net.inet.icmp.log_redirect: 0
net.inet.icmp.bmcastecho: 1
net.inet.igmp.recvifkludge: 1
net.inet.igmp.sendra: 1
net.inet.igmp.sendlocal: 1
net.inet.igmp.v1enable: 1
net.inet.igmp.v2enable: 1
net.inet.igmp.legacysupp: 0
net.inet.igmp.default_version: 3
net.inet.igmp.gsrdelay: 10
net.inet.igmp.debug: 0
net.inet.tcp.rfc1644: 0
net.inet.tcp.mssdflt: 512
net.inet.tcp.keepidle: 7200000
net.inet.tcp.keepintvl: 75000
net.inet.tcp.sendspace: 131072
net.inet.tcp.recvspace: 131072
net.inet.tcp.keepinit: 75000
net.inet.tcp.v6mssdflt: 1024
net.inet.tcp.ecn_timeout: 60
net.inet.tcp.clear_tfocache: 0
net.inet.tcp.log_in_vain: 0
net.inet.tcp.blackhole: 0
net.inet.tcp.delayed_ack: 3
net.inet.tcp.tcp_lq_overflow: 1
net.inet.tcp.recvbg: 0
net.inet.tcp.drop_synfin: 1
net.inet.tcp.reass.overflows: 0
net.inet.tcp.slowlink_wsize: 8192
net.inet.tcp.maxseg_unacked: 8
net.inet.tcp.rfc3465: 1
net.inet.tcp.rfc3465_lim2: 1
net.inet.tcp.recv_allowed_iaj: 5
net.inet.tcp.doautorcvbuf: 1
net.inet.tcp.autorcvbufmax: 1048576
net.inet.tcp.lro: 0
net.inet.tcp.lrodbg: 0
net.inet.tcp.lro_startcnt: 4
net.inet.tcp.disable_access_to_stats: 1
net.inet.tcp.rcvsspktcnt: 512
net.inet.tcp.rexmt_thresh: 3
net.inet.tcp.path_mtu_discovery: 1
net.inet.tcp.slowstart_flightsize: 1
net.inet.tcp.local_slowstart_flightsize: 8
net.inet.tcp.tso: 1
net.inet.tcp.ecn_initiate_out: 2
net.inet.tcp.ecn_negotiate_in: 2
net.inet.tcp.packetchain: 50
net.inet.tcp.socket_unlocked_on_output: 1
net.inet.tcp.rfc3390: 1
net.inet.tcp.min_iaj_win: 4
net.inet.tcp.acc_iaj_react_limit: 200
net.inet.tcp.doautosndbuf: 1
net.inet.tcp.autosndbufinc: 8192
net.inet.tcp.autosndbufmax: 1048576
net.inet.tcp.ack_prioritize: 1
net.inet.tcp.rtt_recvbg: 1
net.inet.tcp.recv_throttle_minwin: 16384
net.inet.tcp.enable_tlp: 1
net.inet.tcp.sack: 1
net.inet.tcp.sack_maxholes: 128
net.inet.tcp.sack_globalmaxholes: 65536
net.inet.tcp.sack_globalholes: 0
net.inet.tcp.fastopen_backlog: 300
net.inet.tcp.fastopen: 3
net.inet.tcp.fastopen_fallback_min: 10
net.inet.tcp.minmss: 216
net.inet.tcp.do_tcpdrain: 0
net.inet.tcp.pcbcount: 433
net.inet.tcp.tw_pcbcount: 104
net.inet.tcp.icmp_may_rst: 1
net.inet.tcp.rtt_min: 100
net.inet.tcp.rexmt_slop: 200
net.inet.tcp.randomize_ports: 0
net.inet.tcp.win_scale_factor: 3
net.inet.tcp.tcbhashsize: 16384
net.inet.tcp.keepcnt: 8
net.inet.tcp.msl: 15000
net.inet.tcp.max_persist_timeout: 0
net.inet.tcp.always_keepalive: 0
net.inet.tcp.timer_fastmode_idlemax: 10
net.inet.tcp.broken_peer_syn_rexmit_thres: 10
net.inet.tcp.tcp_timer_advanced: 3145
net.inet.tcp.tcp_resched_timerlist: 2289150
net.inet.tcp.pmtud_blackhole_detection: 1
net.inet.tcp.pmtud_blackhole_mss: 1200
net.inet.tcp.preconn_sbsz: 1024
net.inet.tcp.cc_debug: 0
net.inet.tcp.newreno_sockets: 0
net.inet.tcp.background_sockets: 0
net.inet.tcp.cubic_sockets: 412
net.inet.tcp.use_newreno: 0
net.inet.tcp.cubic_tcp_friendliness: 0
net.inet.tcp.cubic_fast_convergence: 0
net.inet.tcp.cubic_use_minrtt: 0
net.inet.tcp.lro_sz: 8
net.inet.tcp.lro_time: 10
net.inet.tcp.bg_target_qdelay: 100
net.inet.tcp.bg_allowed_increase: 8
net.inet.tcp.bg_tether_shift: 1
net.inet.tcp.bg_ss_fltsz: 2
net.inet.udp.checksum: 1
net.inet.udp.maxdgram: 9216
net.inet.udp.recvspace: 196724
net.inet.udp.log_in_vain: 0
net.inet.udp.blackhole: 0
net.inet.udp.pcbcount: 122
net.inet.udp.randomize_ports: 1
net.inet.ipsec.def_policy: 1
net.inet.ipsec.esp_trans_deflev: 1
net.inet.ipsec.esp_net_deflev: 1
net.inet.ipsec.ah_trans_deflev: 1
net.inet.ipsec.ah_net_deflev: 1
net.inet.ipsec.ah_cleartos: 1
net.inet.ipsec.ah_offsetmask: 0
net.inet.ipsec.dfbit: 0
net.inet.ipsec.ecn: 0
net.inet.ipsec.debug: 0
net.inet.ipsec.esp_randpad: -1
net.inet.ipsec.bypass: 0
net.inet.ipsec.esp_port: 4500
net.inet.raw.maxdgram: 8192
net.inet.raw.recvspace: 8192
net.inet.raw.pcbcount: 1
net.inet.mptcp.enable: 1
net.inet.mptcp.mptcp_cap_retr: 2
net.inet.mptcp.dss_csum: 0
net.inet.mptcp.fail: 1
net.inet.mptcp.keepalive: 840
net.inet.mptcp.mpprio: 1
net.inet.mptcp.remaddr: 1
net.inet.mptcp.fastjoin: 1
net.inet.mptcp.zerortt_fastjoin: 0
net.inet.mptcp.rwnotify: 0
net.inet.mptcp.rtthist: 1
net.inet.mptcp.rtthist_thresh: 600
net.inet.mptcp.userto: 1
net.inet.mptcp.rto_thresh: 1500
net.inet.mptcp.use_peer: 1
net.inet.mptcp.peerswitchno: 3
net.inet.mptcp.probeto: 1000
net.inet.mptcp.probecnt: 5
net.inet.mptcp.dbg_area: 0
net.inet.mptcp.dbg_level: 0
net.inet.mptcp.pcbcount: 0
net.inet.mptcp.sk_lim: 16
net.inet.mptcp.delayed: 0
net.inet.mptcp.usesymptoms: 1
net.inet.mptcp.mp_preconn_sbsz: 1024
net.inet.mptcp.force_64bit_dsn: 0
net.inet.mptcp.rto: 3
net.inet.mptcp.nrto: 3
net.inet.mptcp.tw: 60
net.link.generic.system.ifcount: 7
net.link.generic.system.if_verbose: 0
net.link.generic.system.dlil_lladdr_ckreq: 0
net.link.generic.system.dlil_verbose: 0
net.link.generic.system.sndq_maxlen: 128
net.link.generic.system.rcvq_maxlen: 256
net.link.generic.system.rxpoll_decay: 2
net.link.generic.system.rxpoll_freeze_time: 1000000000
net.link.generic.system.rxpoll_sample_time: 10000000
net.link.generic.system.rxpoll_interval_time: 1000000
net.link.generic.system.rxpoll_interval_pkts: 0
net.link.generic.system.rxpoll_wakeups_lowat: 10
net.link.generic.system.rxpoll_wakeups_hiwat: 100
net.link.generic.system.rxpoll_max: 0
net.link.generic.system.rxpoll: 1
net.link.generic.system.if_bw_smoothing_val: 3
net.link.generic.system.if_bw_measure_size: 10
net.link.generic.system.dlil_input_threads: 3
net.link.generic.system.dlil_input_sanity_check: 0
net.link.generic.system.flow_advisory: 1
net.link.generic.system.delaybased_queue: 1
net.link.generic.system.hwcksum_in_invalidated: 0
net.link.generic.system.hwcksum_dbg: 0
net.link.generic.system.start_delayed: 0
net.link.generic.system.start_delay_disabled: 0
net.link.generic.system.hwcksum_dbg_mode: 0
net.link.generic.system.hwcksum_dbg_partial_forced: 0
net.link.generic.system.hwcksum_dbg_partial_forced_bytes: 0
net.link.generic.system.hwcksum_dbg_partial_rxoff_forced: 0
net.link.generic.system.hwcksum_dbg_partial_rxoff_adj: 0
net.link.generic.system.hwcksum_dbg_verified: 0
net.link.generic.system.hwcksum_dbg_bad_cksum: 0
net.link.generic.system.hwcksum_dbg_bad_rxoff: 0
net.link.generic.system.hwcksum_dbg_adjusted: 0
net.link.generic.system.hwcksum_dbg_finalized_hdr: 0
net.link.generic.system.hwcksum_dbg_finalized_data: 0
net.link.generic.system.hwcksum_tx: 1
net.link.generic.system.hwcksum_rx: 1
net.link.generic.system.tx_chain_len_count: 0
net.link.ether.inet.prune_intvl: 300
net.link.ether.inet.max_age: 1200
net.link.ether.inet.host_down_time: 20
net.link.ether.inet.arp_llreach_base: 30
net.link.ether.inet.arp_unicast_lim: 5
net.link.ether.inet.maxtries: 5
net.link.ether.inet.useloopback: 1
net.link.ether.inet.proxyall: 0
net.link.ether.inet.sendllconflict: 0
net.link.ether.inet.log_arp_warnings: 0
net.link.ether.inet.keep_announcements: 1
net.link.ether.inet.send_conflicting_probes: 1
net.link.ether.inet.verbose: 0
net.link.ether.inet.apple_hwcksum_tx: 1
net.link.ether.inet.apple_hwcksum_rx: 1
net.link.bridge.inherit_mac: 0
net.link.bridge.rtable_prune_period: 300
net.link.bridge.rtable_hash_size_max: 2048
net.link.bridge.txstart: 0
net.link.bridge.debug: 0
net.link.loopback.bw_sleep_usec: 10
net.link.loopback.bw_measure: 0
net.link.loopback.max_dequeue: 256
net.link.loopback.sched_model: 0
net.link.loopback.dequeue_sc: 0
net.link.iptap.total_tap_count: 0
net.link.iptap.log: 0
net.link.pktap.total_tap_count: 0
net.link.pktap.count_unknown_if_type: 0
net.link.pktap.log: 0
net.key.debug: 0
net.key.spi_trycnt: 1000
net.key.spi_minval: 256
net.key.spi_maxval: 268435455
net.key.int_random: 60
net.key.larval_lifetime: 30
net.key.blockacq_count: 10
net.key.blockacq_lifetime: 20
net.key.esp_keymin: 256
net.key.esp_auth: 0
net.key.ah_keymin: 128
net.key.prefered_oldsa: 0
net.key.natt_keepalive_interval: 20
net.inet6.ip6.forwarding: 0
net.inet6.ip6.redirect: 1
net.inet6.ip6.hlim: 64
net.inet6.ip6.maxfragpackets: 3072
net.inet6.ip6.accept_rtadv: 1
net.inet6.ip6.keepfaith: 0
net.inet6.ip6.log_interval: 5
net.inet6.ip6.hdrnestlimit: 15
net.inet6.ip6.dad_count: 1
net.inet6.ip6.auto_flowlabel: 1
net.inet6.ip6.defmcasthlim: 1
net.inet6.ip6.gifhlim: 0
net.inet6.ip6.kame_version: 2009/apple-darwin
net.inet6.ip6.use_deprecated: 1
net.inet6.ip6.rr_prune: 5
net.inet6.ip6.v6only: 0
net.inet6.ip6.rtexpire: 3600
net.inet6.ip6.rtminexpire: 10
net.inet6.ip6.rtmaxcache: 128
net.inet6.ip6.use_tempaddr: 1
net.inet6.ip6.temppltime: 86400
net.inet6.ip6.tempvltime: 604800
net.inet6.ip6.auto_linklocal: 1
net.inet6.ip6.prefer_tempaddr: 1
net.inet6.ip6.use_defaultzone: 0
net.inet6.ip6.maxfrags: 6144
net.inet6.ip6.mcast_pmtu: 0
net.inet6.ip6.neighborgcthresh: 1024
net.inet6.ip6.maxifprefixes: 16
net.inet6.ip6.maxifdefrouters: 16
net.inet6.ip6.maxdynroutes: 1024
net.inet6.ip6.fragpackets: 0
net.inet6.ip6.fw.enable: 1
net.inet6.ip6.fw.debug: 0
net.inet6.ip6.fw.verbose: 0
net.inet6.ip6.fw.verbose_limit: 0
net.inet6.ip6.scopedroute: 1
net.inet6.ip6.adj_clear_hwcksum: 0
net.inet6.ip6.input_perf: 0
net.inet6.ip6.input_perf_bins: 0
net.inet6.ip6.output_perf: 0
net.inet6.ip6.output_perf_bins: 0
net.inet6.ip6.select_srcif_debug: 0
net.inet6.ip6.select_srcaddr_debug: 0
net.inet6.ip6.select_src_expensive_secondary_if: 0
net.inet6.ip6.mcast.maxgrpsrc: 512
net.inet6.ip6.mcast.maxsocksrc: 128
net.inet6.ip6.mcast.loop: 1
net.inet6.ip6.only_allow_rfc4193_prefixes: 0
net.inet6.ip6.maxchainsent: 1
net.inet6.ipsec6.def_policy: 1
net.inet6.ipsec6.esp_trans_deflev: 1
net.inet6.ipsec6.esp_net_deflev: 1
net.inet6.ipsec6.ah_trans_deflev: 1
net.inet6.ipsec6.ah_net_deflev: 1
net.inet6.ipsec6.ecn: 0
net.inet6.ipsec6.debug: 0
net.inet6.ipsec6.esp_randpad: -1
net.inet6.icmp6.rediraccept: 0
net.inet6.icmp6.redirtimeout: 600
net.inet6.icmp6.nd6_prune: 1
net.inet6.icmp6.nd6_delay: 5
net.inet6.icmp6.nd6_umaxtries: 3
net.inet6.icmp6.nd6_mmaxtries: 3
net.inet6.icmp6.nd6_useloopback: 1
net.inet6.icmp6.nodeinfo: 3
net.inet6.icmp6.errppslimit: 500
net.inet6.icmp6.nd6_debug: 0
net.inet6.icmp6.nd6_accept_6to4: 1
net.inet6.icmp6.nd6_optimistic_dad: 63
net.inet6.icmp6.nd6_onlink_ns_rfc4861: 0
net.inet6.icmp6.nd6_prune_lazy: 5
net.inet6.icmp6.rappslimit: 10
net.inet6.icmp6.nd6_llreach_base: 30
net.inet6.icmp6.nd6_maxsolstgt: 8
net.inet6.icmp6.nd6_maxproxiedsol: 4
net.inet6.icmp6.prproxy_cnt: 0
net.inet6.mld.gsrdelay: 10
net.inet6.mld.v1enable: 1
net.inet6.mld.v2enable: 1
net.inet6.mld.use_allow: 1
net.inet6.mld.debug: 0
net.inet6.send.opstate: 0
net.inet6.send.opmode: 0
net.systm.kctl.autorcvbufmax: 262144
net.systm.kctl.autorcvbufhigh: 139904
net.systm.kctl.debug: 0
net.ppp.l2tp.nb_threads: 15
net.ppp.l2tp.thread_outq_size: 1024
net.ndrv_multi_max_count: 1024
net.route.verbose: 0
net.statistics: 1
net.statistics_privcheck: 0
net.stats.debug: 0
net.stats.sendspace: 2048
net.stats.recvspace: 8192
net.necp.drop_all_level: 0
net.necp.debug: 0
net.necp.pass_loopback: 1
net.necp.pass_keepalives: 1
net.necp.socket_policy_count: 2
net.necp.socket_non_app_policy_count: 1
net.necp.ip_policy_count: 1
net.necp.session_count: 4
net.netagent.debug: 5
net.netagent.registered_count: 1
net.netagent.active_count: 0
net.cfil.log: 3
net.cfil.debug: 1
net.cfil.sock_attached_count: 0
net.cfil.active_count: 0
net.cfil.close_wait_timeout: 1000
net.cfil.sbtrim: 1
net.pktmnglr.log: 3
net.classq.verbose: 0
net.classq.sfb.holdtime: 0
net.classq.sfb.pboxtime: 0
net.classq.sfb.hinterval: 0
net.classq.sfb.target_qdelay: 0
net.classq.sfb.update_interval: 0
net.classq.sfb.increment: 82
net.classq.sfb.decrement: 16
net.classq.sfb.allocation: 0
net.classq.sfb.ratelimit: 0
net.pktsched.verbose: 0
net.alf.loglevel: 55
net.alf.perm: 0
net.alf.defaultaction: 1
net.alf.mqcount: 0

toast0 · Oct 14, 2020

l008com said:
It is a real shame that that -d flag does not seem to work for any key at all.

net.inet.tcp.blackhole exists, is set to 0. Seems wise to set it to 1. This wasn't in your post but I came across it and it seems helpful.

I'm ambivilent on this one, RSTs on closed ports are a 1:1 packet exchange, so it's not a likely DDoS reflection vector, and setting this can make debugging harder. OTOH, it also makes service restarts more seamless (if the server isn't listening, the SYNs get dropped instead of RSTed, and when the client retransmits, the server may be listening by then), and it makes port scanning harder, and if you get a lot of junky traffic, it would keep your outbound network cleaner.

l008com said:
kern.maxfiles exists and is set to 450,000. My server is in the secret-ish "high performance mode" that likely bumped this number up by a lot. My desktop computer right now has that value set to 49k for example.

net.inet.tcp.msl is currently set to 15000. I think it's safe to assume that's ms, so 15s. Should I potentially go even lower than 10s? I'm not 100% clear on what this is the exact timeout for.

15 seconds is probably fine. MSL is the estimate of the maximum time it takes for a packet to traverse the interwebs; 2 times MSL is used for the timeout on fully closed sockets, to allow for some delayed packets to come in without confusing everyone.

l008com said:
kern.ipc.maxsockets does not exist. I do have the following similar sounding keys with their listed values:
kern.ipc.maxsockbuf 8388608
kern.ipc.somaxconn 1536

maxsockbuf is the maximum socket buffer size; not so relevant.

FreeBSD sysctl -d says kern.ipc.somaxconn: Maximum listen socket pending connection accept queue size (compat)

I'm guessing it's not compat on MacOS, this is the maxmum number of connections that have been three way handshaked, ready for the server to accept them; the server sets the value with the second argument to listen. Also, man 2 listen on FreeBSD says

Note that before FreeBSD 4.5 and the introduction of the syncache, the
backlog argument also determined the length of the incomplete connection
queue, which held TCP sockets in the process of completing TCP's 3-way
handshake. These incomplete connections are now held entirely in the
syncache, which is unaffected by queue lengths. Inflated backlog values
to help handle denial of service attacks are no longer necessary.

If there's no evidence of the syncache, it's possible the TCP stack was taken from before FreeBSD 4.5 (January, 2002). If this is the case, bumping up that sysctl and setting a high backlog may be what you need to do to solve your issues! Hopefully, it's easy to change the listen call, but even if you're running a 3rd party binary with no source, there are ways to intercept and adjust system calls.

You should be able to see the listen queue size with netstat -aLN, you should see something like

Code:

Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen                           Local Address
tcp4  0/0/128                          127.0.0.1.6010
tcp6  0/0/128                          ::1.6010
tcp6  0/0/128                          *.443
tcp4  0/0/10                           127.0.0.1.25

It might look a little different, but the last number in the group confirms the size. If you don't set somaxconn and just set the backlog really high, you will get limited by the sysctl. If your listen queue size is 512, I think we've found the limit.

I'd be surprised if Apple is still using a FreeBSD 4 era TCP stack, because it's old as heck, but they're also doing really good stuff with TCP in general. They've deployed MultiPath TCP on iOS and Mac OS, are doing a lot of good things for IPv6 deployment as well, and they've got working and fast path MTU blackhole detection on iOS and I assume Mac OS, and Android doesn't (grr!). But the evidence points this way, so.

l008com said:
net.inet.ip.maxfrags does not exist, but I do have these similar sounding keys:
net.inet.ip.maxfragpackets 3072
net.inet.ip.fragpackets 0
net.inet.ip.maxfragsperpacket 128

I would tweak maxfragpackets, it's probably the same thing. In FreeBSD before 11?, fragment reassembly does a linear search on the whole array, 3k packets to search through isn't that awful; my servers woudl auto-tune that up to like 300k, which was awful. I'd still lower it to 64 or something.

l008com said:
net.inet.tcp.syncache.rexmtlimit does not exist, but I do have this, is it the same?
net.inet.tcp.rexmt_thresh 3

That value may not be tunable then, I think tcp.rexmt_thresh is something else, I wouldn't touch it. F5's load balancer documentation (lol) says rexmt-thresh: Specifies the number of duplicate ACKs (retransmit threshold) to start fast recovery.

l008com said:
net.inet.tcp.syncache.cachelimit, net.inet.tcp.syncache.count, net.inet.tcp.syncookies do not exist, and I can't find anything similar.

Apparently El Capitan doesn't support syncache or syncookies by default. That's not good news for me. But it seems impossible that there isn't a setting in here somewhere that lets me set the half-open connection limit. I searched the output of sysctl -a net for 512 but none of the keys 'sound' like they are the right value. But how can I tell? There's no -d for any key. Googling these things brings you almost exclusively to dead ends. It's like random shots in the dark. Here are all of my ipv4 keys that are currently set to 512:
net.inet.ip.mcast.maxgrpsrc
net.inet.ip.dummynet.red_avg_pkt_size
net.inet.tcp.mssdflt
net.inet.tcp.rcvsspktcnt

Those are all not it. I did find http://newosxbook.com/bonus/vol1ch16.html which tells us what rcvsspktcnt and mssdflt (I knew that one already though) are.

l008com said:
For shits n' giggles, I'm going to attach the entire list of sysctl -a net keys. Pasting a list of ~500 keys to a forum isn't the best way to go about this, but theres so little documentation, I'm not sure what else to do?

Nothing in that big old list jumps out at me. I think the listenqueue stuff above is the best bet. If that doesn't work, I'm almost intrigued enough to look at the Darwin sources for your version and see if I can figure out what FreeBSD that corresponds to, just to make sure.

l008com · Oct 17, 2020

toast0 said:
...setting a high backlog may be what you need to do to solve your issues...

I agree that will probably help a lot, especially if I set it to some very large number. But we have not yet determined what the setting is for number of in-progress connections, and timeout for those in-porgress connections, have we?

Also, I ran netstat -aLn (not sure if that was a casetypo on your part but I have no "N" option. The results are not what I expected...

netstat -aLn said:
Current listen queue sizes (qlen/incqlen/maxqlen)
Listen Local Address
0/0/999 *.31416
0/0/128 *.88
0/0/128 *.88
0/0/1024 *.3283
0/0/80 *.3306
0/0/100 *.587
0/0/100 *.587
0/0/100 *.25
0/0/100 *.25
0/0/128 *.36330
0/0/128 *.7396
0/0/128 *.993
0/0/511 *.443
0/0/511 *.80
0/0/1536 *.22
0/0/1536 *.22
0/0/1536 *.445
0/0/1536 *.445
0/0/1536 *.5900
0/0/1536 *.5900
0/0/1536 *.548
0/0/1536 *.548

Am I misinterpreting this, or does this mean that I seem to have separate queues for each port number? If so, that would explain where the 512 HTTP hanging connections come from. However when my server is being hit, it becomes fully unreachable. If my limit was on a per-port bases, the SYN attack would only block access to the web server. So I'm not really sure what I'm seeing here. The deeper we dig, the more confused I'm getting about exactly what is going on.

l008com · Oct 17, 2020

Jesus is this specifically an apache setting? is all this just a problem in my httpd.conf?

Although again, if that were the case, wouldn't the SYN flood just block access to my web server, leaving all other services unaffected?

http://httpd.apache.org/docs/2.2/mod/mpm_common.html#listenbacklog

I tried bumping ListenBacklog from 511 to 4096 just to see what would happen, but it has no affect on the result of netstat, it's still showing 511 queue each for http and https.

toast0 · Oct 17, 2020

It's going to be a while before I can give a good response; I'm going to grab kernel sources and see if that helps figure out what's going on. I think you're asking the right questions, though; it doesn't make a lot of sense that if it's a per socket backlog, you shouldn't see problems on other sockets. It's possible there's more than one thing going on though.

On netstat, sorry for the typo, you got what I meant. If you add that to your cron where you saw all the syn_sent connections, you'll probably see the qlen grow too, at least if we're on the right track.

On the listenbacklog setting, I seem to remember some confusing stuff with how your values get reflected. If you set it to 4096, but the sysctl max is 1536, you might end up with something totally different, because that makes sense. You might need to do a full stop/start of apache rather than reload --- I don't remember if apache tries to inherit listen sockets accross reloads rather than reopen them.

toast0 · Oct 18, 2020

OK, I'm looking at http://opensource.apple.com/tarballs/xnu/xnu-3248.20.55.tar.gz which I think is El Capitan (looks like 10.11.2 https://opensource.apple.com/release/os-x-10112.html )

Looking at bsd/kern/netinet, I'm pretty sure the tcp source here is mostly from some time in 2001, which predates the syncache. There's some stuff with FreeBSD dates from 2003/2004, but it's ip_divert, ip_dummynent and ip_fw2. 10.15.6 source mentions the syncache, but only because they imported some new structures that reference it, not because they actually pulled it in.

However, I was poking around and saw this note:

Code:

                 * When kern.ipc.soqlencomp is set to 1, so_qlen
                 * represents only the completed queue.  Since we
                 * cannot let the incomplete queue goes unbounded
                 * (in case of SYN flood), we cap the incomplete
                 * queue length to at most somaxconn, and use that
                 * as so_qlen so that we fail immediately below.

So I wonder if what kern.ipc.soqlencomp is set to/if you can set it to 1, and see if that helps. Possibly tweaking somaxconn higher (maybe).

At this point, if I were you, I'd think about how to migrate this service to some other OS if possible. FreeBSD has a great networking stack, and always has, but a great network stack for 2001 isn't prepared for the network abuse of 2020.

Does your colo maybe offer a load balancer, you could stick your one host behind a load balancer in tcp mode and let it handle the SYN floods?

toast0 · Oct 18, 2020

OK, So you should probably read (or skim!) the paper introducing the syncache (and syncookies) to FreeBSD: https://www.usenix.org/legacy/publications/library/proceedings/bsdcon02/lemon.html (PDF is most legible IMHO)

Section 4, Motivation describes what happens with unmodified FreeBSD 4.4 when syn-flooded, which should be pretty similar to more or less any version of OS X:

Initial tests were performed on the target machine using an unmodified 4.4-stable kernel while undergoing SYN flooding. The size of the listen socket backlog was varied from the default 128 entries to 1024 entries, as permitted by kern.ipc.somaxconn. The results of the test are presented in Figure 2.

In this test, with a backlog of 128 connections, 90% of the 2000 connections initiated to the target machine complete within 500ms. When the application specifies a backlog of 1024 connections in the listen() call, only 2.5% of the connections complete within the same time period.

The dropoff in performance here may be attributed to the fact that the sodropablereq() function does not scale. The goal of this function is to provide a random drop of incomplete connections from the listen queue, in order to insure fairness.

However, the queue is kept on a linear list, and in order to drop a random element, a list traversal is required to reach the target element. This means that on average, 1/2 of the total length of the queue must be traversed to reach the element; for a listen queue backlog of 1024 elements, this leads to an average of (3 * (1024/2))/2, or 768 elements traversed for each incoming SYN.

Profiling results show that in this particular case, the system spends 30% of its time in sodropablereq(), and subjectively, is almost completely unresponsive.

Based on that, I would say, don't fiddle with the kern.ipc thing I just said; and don't try to make your listen backlog bigger; actually set it smaller; maybe to something like 32 or 64. That may keep your machine responsive during a syn-flood; although legitimate clients won't have any chance at successfully connecting.

l008com · Oct 18, 2020

toast0 said:
Based on that, I would say, don't fiddle with the kern.ipc thing I just said; and don't try to make your listen backlog bigger; actually set it smaller; maybe to something like 32 or 64. That may keep your machine responsive during a syn-flood; although legitimate clients won't have any chance at successfully connecting.

There isn't much point in keeping the server responsive, if users can't connect to it. That behavior seems like a bug more than anything else. It seems like every possible way to fight a synflood is disabled from macos. That seems nuts to be, there must be SOME solution here. Moving my server to a different platform may be a solution for the distance future, but for now I'm stuck on OS X and need to find a way to make it work. My data center does have some kind of protection service but it's super expensive so that's out. Among other things, maybe I'll see if I can get pf's synproxy to work. I'll also probably play around with kern.ipc.somaxconn anyway just to see what happens. It's default value is 1536 but I'm not sure if that counts since my other value has been set to 0.

l008com · Oct 19, 2020

toast0 said:

...I was poking around and saw this note:

Code:

                 * When kern.ipc.soqlencomp is set to 1, so_qlen
                 * represents only the completed queue.  Since we
                 * cannot let the incomplete queue goes unbounded
                 * (in case of SYN flood), we cap the incomplete
                 * queue length to at most somaxconn, and use that
                 * as so_qlen so that we fail immediately below.

Where did you find that exactly? There seems to be a million different tars to download and look through :/

toast0 · Oct 19, 2020

That note is in the xnu tgz I linked, under bsd/kern/ something.c; grep on soqlencomp and you'll find it. It's in the newconn or something call that bsd/netinet/tcp_input.c is doing to make a new syn_recv socket.

Hopefully synproxy works for you. Otherwise, terrible, but possibly workable ideas include userspace tcp or running a FreeBSD (or Linux, whatevs) vm and passing it the traffic before the Mac tcp stack, and run some lightweight proxy to the mac (maybe xhyve could work as the vm layer, maybe). Also bad, but not so much work, set a pf rule to rate limit incoming syns, possibly dynamically based on current backlog / cpu time in the kernel; depending on flooding behavior, maybe rate limit syns with tcp options to a lower rate than syns without options --- most oses will retry syns three times with options, then some more times without; if your flooder always sends syns with options, blocking only those could help for a while.

Keeping the system responsive in the face of a SYN flood, even if no new connections are possible on the flooded ports means at least you can finish serving requests, and serve new requests on existing connrctions, and you can still ssh in to poke around and examine the system. It's pretty shitty, but it's better than if the system is basically locked solid until all the packets are handled.

This whole situation is shitty, but I guess there's some reasons why Apple abandoned OS X Server.

l008com · Oct 19, 2020

Going back a bit, the results of `netstat -aLn` are kind of unexpected. Even when the server is unreachable, i've only seen a handful of the 511 web slots full. It's never actually been full. Although all of the episodes i've logged lately have been a lot of FIN_WAIT_1/2 packets, not the sold stream of SYN_RCVD. Note that I did just find a full blown syn flood incident where a regular netstat is full of SYN_RCVD and the netstat -aLn does in fact show 511/511/511 for https. So that's a SYN flood for sure. Is there also such a thing as a FIN_WIAT flood? Seems like a case of way-too-long timeout if my system is flooded with FIN_WAIT's?

Also note I was able to raise that 511 apache limit via http.conf, but it required a full server reboot. And even though I gave it 4096 as the connection, it only went up to 1536 which is the default value for kern.ipc.somaxconn (which I have not messed with yet).

Also, don't suppose you know of a source for much more detailed documentation of pf? The internet seems to be very slim on that, although it may just be that two letter words are inherently hard to search for. I've only done relatively simple PF rules but I wonder what else I could do that may help.

l008com · Oct 19, 2020

Also, is there a command that will instantly kill all tcp connections, either on an interface or just overall? Not shut down the interface but just fully clear it out to hopefully clear out lingering connections.

toast0 · Oct 19, 2020

On *BSD, there's tcpdrop to unilaterally drop a connection from the kernel state; you'd have to loop through all the connections on netstat though. I don't know if that's available for Mac OS.

I'm not familiar with pf, on FreeBSD, my boss liked ipfw, so that's what we used. A quick search around finds a FAQ and a man page. Although, those are probably both addressing current OpenBSD, which may have features that aren't available in Mac OS.

FIN_WAIT_1 would be your side closed the connection from ESTABLISHED or SYN_RECIEVED (can't tell which), and then you're waiting for an ACK to go to FIN_WAIT_2, or a FIN to go to CLOSING (still waiting for the ACK to your FIN); once you've seen a FIN and an ACK of your FIN, it should go to TIME_WAIT (2 * msl). I'm not sure if FIN_WAITs have timeouts or not, I know LAST_ACK (other side sent FIN, you acked, and closed, but never got your FIN acked) doesn't have a timeout in FreeBSD until 2015 to much consternation (security advisory which does have a workaround, if you have tcpdrop); I have to assume that Mac OS would see that issue too, but I ran into it most sending large apks to mobile clients, if you're sending smaller files, it's probably less likely to trigger. The FreeBSD bug about LAST_ACK has some people mentioning TIME_WAIT too, so I suspect those states also don't have timeouts.

If you don't have tcpdrop, you might be able to spoof RSTs to clear out the states, although you may need to get seq/ack numbers, not sure if that's available with netstat. I found google's tcp_killer but that doesn't seem like it actually does the same thing, but may confirm tcpdrop isn't available.

l008com · Oct 20, 2020

Another related question: Do you have any ideas on how one might determine they are under a synflood attack, from a script? I know when I start dropping pings, but there can be other reasons you drop pings, like problems on the other end. Is there a command with machine-parseable output that maybe tells you the total number of connections in each state? Or even better, hows about a flat to output the result of netstat -n in xml

toast0 · Oct 20, 2020

Try netstat -s to see protocol level stats, you can add -p tcp or whatever protocol to only get the ones that are relevant for you. Machine parsable output is in FreeBSD 11.0 (libxo), but yeah. For a given build of netstat, the output should always show up in the same order, so I'd just parse with regex like /(\d+) connection requests/ in order, etc. On my system I see

Code:

TCP connection count by state:
        0 connections in CLOSED state
        13 connections in LISTEN state
        0 connections in SYN_SENT state
        0 connections in SYN_RCVD state
        4 connections in ESTABLISHED state
        0 connections in CLOSE_WAIT state
        0 connections in FIN_WAIT_1 state
        0 connections in CLOSING state
        0 connections in LAST_ACK state
        0 connections in FIN_WAIT_2 state
        5 connections in TIME_WAIT state

at the end of netstat -s -p tcp, but I think that's newish; you might have to parse the netstat -n output, which is more painful.

l008com · Oct 20, 2020

Hmmm my -s flag doesn't give me a summary like that unfortunately. `netstat -s -p tcp` is fairly parseable but i hate to have a script do that much work when its probably going to run every minute. I'll poke around with this more later.

toast0 · Oct 20, 2020

l008com said:
Hmmm my -s flag doesn't give me a summary like that unfortunately. `netstat -s -p tcp` is fairly parseable but i hate to have a script do that much work when its probably going to run every minute. I'll poke around with this more later.

You can do better than parsing text, but only if you're willing to muck about with structs that are defined by the C code that reads and writes them. I can tell you at my last job we ran netstat -s -p tcp once a second and parsed it with perl, and it wasn't a big deal, but I understand it feels icky. I also briefly ran some code that would watch the stats once a minute or so, waiting for something bad to happen, then would tcpdump to find the bad connection and tcpdrop it; ran that for about a week or so, a few days to find the issue and fix it, and then a few more to not stagger kernel restarts. That was for a very fun bug in syncookies

l008com · Oct 22, 2020

I was able to parse the output pretty easily once I thought about it a bit. PHP has so many built in functions, it was pretty much a piece of cake.

Moving on, lets get back on `pf` if we could.

So if `synproxy` is an always-on feature, then what I need is to be able to turn it on when needed, and turn it back off again when it's not.

For example, how could i disable a specific rule in my .conf, and then replace it? Or is there another way to do that?

Code:

pass in quick proto tcp from any to en1 port { 80 443 } flags S/SA keep state
<=>
pass in quick proto tcp from any to en1 port { 80 443 } flags S/SA synproxy state

I'd want to toggle between those two, but I'm not sure how I'd do something like that using pfctl. Is there a way to give a rule some kind of unique ID so that I can later dynamically remove that specific rule and replace it with a new one, with the same ID. So I could keep going back and forth?

toast0 · Oct 23, 2020

I think you want to look at anchors. Over in ipfw land, all the rules have numbers, but names seem nicer

l008com · Oct 23, 2020

Ok I have officially enabled synproxy on the ethernet device that hosts the popular website, where most likely all of the floods are coming from. Server hasn't crashed or gone unreachable so that's a plus. I'm monitoring the total number of SYN_RCVD and its very reasonable. Question: Is there a way I can see or at least count how many SYN's that `pf` is currently proxying? That would be helpful information, then i'd know if i were actually under attack, assuming the protection is working, and I'd be able to gauge the size of the attack(s)

l008com · Oct 24, 2020

l008com said:
Ok I have officially enabled synproxy on the ethernet device...

12 hours later, so far so good. No problems with the firewall, and my SYN_RCVD counts have been super low. The website is actually noticeably more responsive. And no takedowns.

toast0 · Oct 24, 2020

Nice!

The interwebs says:

For more verbose output including rule counters, ID numbers, and so on, use:
pfctl -vvsr

I'd give that a try and see if it's helpful? Otherwise, I dunno.

l008com · Oct 24, 2020

Spoke too soon. Somehow I'm getting syn-flooded right through the firewall, not sure how that's even possible. I can check to confirm but I don't think they're coming in on the other device/ip

l008com · Oct 24, 2020

I don't get it. I have synproxy enabled on en1, and yet i had over 1000 SYN_RCVD connections on en1. It was like it was working and then for no reason, it just stopped proxying. Of course also, the limit is 4000 so the amount i have, while it was a lot, it shouldn't have been enough to block tcp connections. I'm going to turn synproxy on on both ethernet cards just for the hell of it. I'm very confused on what's going on at this point.

l008com · Oct 24, 2020

Thing keep getting worse. Now the firewall isn't working at all. That is, despite my ruleset, it doesn't seem to be blocking anything. I'm able to see open ports like 21 and 22, even though there are NO rules at all allowing traffic on those ports. I've removed the synproxy stuff but it's still not blocking anything at all. I'm also seeing open ports 554 and 7070. On both interfaces.

l008com · Oct 24, 2020

This is my entire ruleset. It's so simple, I can't see anywhere where things could be going haywire. At one point with this config, port 22 was wide open and I was able to SSH into my server right through the firewall. Now I'm getting open ports for all these services i'm not running (21,554,7070). I'm contacted my data center to see if maybe those ports are being forwarded to another machine for some reason, and this is just a false problem.

Code:

set skip on lo0

table <VPN> const { 10.1.2.1/24 }
table <badhosts_a> persist
table <badhosts_b> persist

block in quick from <badhosts_a> to any
block in quick from <badhosts_b> to any

block in all

pass in quick proto tcp from any to 204.11.33.59 port { 25 80 443 587 993 } flags S/SA keep state
pass in quick proto tcp from any to 204.11.35.98 port { 80 443 } flags S/SA keep state
pass in quick proto { esp icmp } from any to any keep state
pass in quick from <VPN> to any flags S/SA keep state
pass in quick proto udp from any to 204.11.33.59 port { 500 1701 4500 } keep state
pass out proto { tcp, udp, icmp } from any to any keep state

l008com · Oct 24, 2020

I'm beginning to get the impression that synproxy is simply broken on Mac OS (10.11) and that I have absolutely no defense against the non stop syn floods.

l008com · Oct 25, 2020

Ok here is an update. One thing I've learned is that the command I've been using to reload my ruleset is not working.

Code:

pfctl -F rules -f /etc/pf.conf

It appears to work. And the changes I make *ARE* reflected by running

Code:

pfctl -sr

But they don't actually WORK until a reboot. So that invalidates a lot of the testing I have previously done.

Today, I broke my primary port opening rules into three chunks, the mail ports and then the two web ports for each IP. I threw `synproxy` on both of the web server lines and when the server came up, the web server was unreachable. Mail worked, VPN worked, but all websites running on both IPs were both completely unreachable. Seems synproxy doesn't actually work at all. I'm going to do a little more testing but I'm not expecting anything different.

Also I just ran yet another test where I only enabled synproxy on one of the IPs, rebooted, and consistently, the websites on the synproxy'd rule were fully unreachable. The websites on the regular rule (keep state) work just fine.

l008com · Oct 25, 2020

So I'm trying to think of how I should move forward. There are lots of options but none of them are great.

One that I think you mentioned above was the idea of running BSD in a virtual machine, routing inbound traffic to that, then forwarding that traffic, after BSD does the handshaking, on to the servers 'real' OS. Is such a thing possible though? Handshaking happens on such a low level, don't the connections already have to be made before they can be redirected to the virtual machine? In other words, wouldn't a syn flood still choke up the host machine and not the guest machine? And how would you set up that kind of back and forth forwarding, assuming it is possible somehow?

My server has 24 GB of RAM and I think 8 CPU cores. And it's just a web server. High traffic but still, it has plenty of resources to spare, so this may actually be a viable solution.

thebufenator · Oct 25, 2020

Why not just run the webserver entirely in a vm on a modern BSD?

l008com · Oct 26, 2020

thebufenator said:
Why not just run the webserver entirely in a vm on a modern BSD?

There isn't really any benefit to doing that. If that can be done, meaning if the TCP handshakes can be forwarded in full to the VM without clogging up the host, then using BSD as a 'filter' should work just as well as running the full server in there. But theres no reason to run the whole server in there. My apache/mariadb/php etc are all up to date and run great. I don't see the upside to move them into a VM when I can just keep running them directly on the hardware?

Also I was thinking. If this is something I could do, with VMWare for example, then it should also be possible to direct all incoming connections to a different program. So basically a 3rd party syncache program should be able to run between the firewall and the server apps, cacheing the handshakes? That doesn't seem like a particularly outrageous configuration, but I haven't seen any such "add-on", 3rd party syn-flood fighting apps, only stuff fully built into misc OSes?

Lastly, back to `pf` and synproxy...
I've been playing around with it a bit. Checking `pfctl -si` I can see that the counter for `synproxy` is in-fact increasing. And with `netstat`, I can see that my connections attempts (attempts to load a web page in a browser) are showing up as SYN_RCVD connections but they never move past that. This is making me think there may just be some simple problem with my config. Somehow I'm blocking ACKs or something like that, when using synproxy. But in a way that I am not blocking with an otherwise identical config, that uses `keep state` instead.

toast0 · Oct 26, 2020

Yes, you could pontentially do this with a program, if there's a way to get in between the ethernet stack and the tcp stack. I saw some references to divert sockets in the XNU source, but possibly with ipfw only and not pf. ipfw is deprecated in mac os, but as we've seen the tcp stack is a shambles, so like whatever

Your pf ruleset is small, it wouldn't take much to rewrite it in ipfw, if that were a requirement. That said, I haven't seen any programs to do this. There's this https://github.com/LTD-Beget/syncookied but from reading their blog, it looked like it requires the backend server be Linux so it doesn't have to keep state on established connections and rewrite the seq/ack (maybe I misread though). If you look at it, and the software seems viable, there's probably some way to get packets to a program, and then inject them back to the local kernel stack.

OK, so setting up the VM and getting packets to it. I'd pull FreeBSD a 12.1 VM image from here

(I'm using X.98 for your protected IP and Y.59 for your regular IP, in case you want to edit out IPs at some point)

Option 1, if it works in your hosting, setup the VM with a (virtual) ethernet card bridged to your external interface, configure the VM for X.98, deconfigure that on your mac; have the VM listen on X.98:80, X.98:443, forward (with haproxy or whatever) to Y.59:81 and Y.59:444; configure apache and pfctl appropriately and firewall in the VM, done deal.

Option 2; we've got to forward the packets from mac os to the VM. Setup the VM with an internal network, let's say host is 192.168.0.1 and guest is 192.168.0.2. Configure guest to also have X.98 on lo0, and 192.168.0.1 as default route. Remove your existing 80,443 rule, and add pf rules like

pass in quick proto tcp from any to X.98 port { 80 443 } route-to 192.168.0.2 no state
pass in quick on INTERNAL_INTERFACE proto tcp from X.98 port { 80 443} to any route-to YOUR_GATEWAY_IP no state

have the VM listen on X.98:80, X.98:443 and forward to 192.168.0.1:80 and 443.

Option 2 is a bit trickier to setup, because we haven't setup a way for the guest to get access to the interwebs. You could setup NAT while you install haproxy, or fetch the packages/source on the mac and copy it into the guest.

I'd recommend haproxy to pass the traffic, because it's pretty bullet proof (especially single threaded), and easy to configure. If your total established session count is low, config can be as simple as:

Code:

frontend https
        bind X.98:443
        mode tcp
        default_backend real_https

frontend http
        bind X.98:80
        mode tcp
        default_backend real_https

backend real_https
        mode tcp
        server real_https 192.168.0.1:443

backend real_http
        mode tcp
        server real_http 192.168.0.1:80

Slight change to the config is required if option 1 is viable. If you want to have a lot of connections (I'd say more than 10k), it makes sense to setup port ranges, and you might want to add more IPs or listening ports on the server side. You probably also want to send the original IPs with proxy protocol, adding send-proxy-v2 (or send-proxy) on the server lines here, and some additional configuration on the server side. Note, I've configured haproxy to just be a tcp forwarder; there's no need to do anything fancy, and tcp forwarding avoids any potential problems with http parsing or TLS certificate management. I would configure the guest with a single CPU, and double check that haproxy runs singlethreaded; I debugged some issues with multithreaded haproxy, and I think it's good now, but why risk it, if you don't need to).

I'm not 100% sure if mac os pf supports route-to though; It's available in ipfw as forward IP; we could switch you out to ipfw if needed.

Either of these two options should work OK. In either case, the mac TCP stack won't manage state for the external connections, so you won't fill up with SYN_RECEIVED etc, it's just passing the packets through; and the pf rules for option 2 don't need state either. Option 2 might need ICMPs forwarded for best results, but modern TCP stacks are built around the reality that ICMP gets blocked even when it shouldn't, so forwarding ICMPs is an optimization, not a requirement.

Also, if you can get pfctl to change rules without a reboot, option 2 is less disruptive/easier to test.

l008com · Oct 26, 2020

Before I get involved with the mess that running a separate VM will be, I want to make sure I exhaust all other, simpler options. With that in mind, did you see the last part of my previous message? It looks like synproxy is half-working, it appears as though the firewall is just somehow blocking responses to SYN's so they are getting stuck in syn_rcvd. Getting synproxy working would be the preferred solution due to it's simplicity even if it doesn't work 100%. I can enable it on the specific ports that are getting hit and eek by. But synproxy is surprisingly poorly documented on the internet, and pf in general doesn't have nearly as much info around as I would have expected.

Blocking SYN-Flood Attacks on macOS?

Limp Gawd

[H]ard|Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

2[H]4U

2[H]4U

2[H]4U

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

[H]ard|Gawd

Limp Gawd

2[H]4U

Limp Gawd