Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#5438 closed Bug/Something is broken (fixed)

keys.mayfirst.org not accepting key updates

Reported by: andrei Owned by: Daniel Kahn Gillmor
Priority: Medium Component: Tech
Keywords: keys.mayfirst.org zimmermann.mayfirst.org Cc:
Sensitive: no

Description

0 asm@host:~$ gpg --keyserver-options debug --send-keys 0xCAE6B5E0E425277B
gpg: sending key 0xCAE6B5E0E425277B to hkp server zimmermann.mayfirst.org
gpgkeys: curl version = libcurl/7.21.0 GnuTLS/2.12.16 zlib/1.2.3.4 libidn/1.15
* About to connect() to zimmermann.mayfirst.org port 11371 (#0)
*   Trying 209.234.253.170... * Connection timed out
* couldn't connect to host
* Closing connection #0
gpgkeys: HTTP post error 7: couldn't connect to host
gpg: keyserver internal error
gpg: keyserver send failed: keyserver error
2 asm@host:~$ ping -c1 keys.mayfirst.org
PING keys.mayfirst.org (209.234.253.170) 56(84) bytes of data.
64 bytes from zimmerman.mayfirst.org (209.234.253.170): icmp_req=1 ttl=64 time=0.093 ms

--- keys.mayfirst.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.093/0.093/0.093/0.000 ms
0 asm@host:~$

Change History (9)

comment:1 Changed 7 years ago by Ross

Owner: set to Daniel Kahn Gillmor
Status: newassigned

comment:2 Changed 7 years ago by Daniel Kahn Gillmor

I'm working on this now; there may be further interruption while i try to fix the problem. i think i have a lead on what the issue is and a workaround, though.

comment:3 Changed 7 years ago by andrei

Resolution: fixed
Status: assignedfeedback

thanks, it seems to be fixed now.

comment:4 Changed 7 years ago by Daniel Kahn Gillmor

Status: feedbackclosed

The problem behind the failures appears to be a pretty serious bug (a possible DoS vulnerability) in SKS, which i've now documented and sent to the sks-devel mailing list.

I found that the MF/PL backup servers at sunset park were actually accidentally triggering this DoS on zimmermann since they were competing for bandwidth with a lot of data transfer for backups.

The workaround is to ensure that an HTTP reverse proxy (i'm using nginx) handles all the direct network traffic, and sks's HKP services are effectively isolated from any network access by listening only on the loopback address.

This should mean that keys.mayfirst.org is more responsive than it has been recently, but it may also mean that sks on zimmermann ends up dealing with a lot more traffic (requests that would have been delayed or timed-out otherwise).

Please re-open a new ticket if you find keys.mayfirst.org unduly sluggish or unresponsive.

comment:5 Changed 7 years ago by Jamie McClelland

Thanks dkg!

zimmermann doesn't have a lot of data to backup and most of it should not change - any ideas why that was causing such competition?

jamie

comment:6 Changed 7 years ago by Jamie McClelland

Possibly related: #5455.

comment:7 in reply to:  5 Changed 7 years ago by Daniel Kahn Gillmor

Replying to https://id.mayfirst.org/jamie:

zimmermann doesn't have a lot of data to backup and most of it should not change - any ideas why that was causing such competition?

I think my initial description was unclear (either that, or i'm misunderstanding your question).

Fannie is connected to the 'net via a pipe that is saturated with (general) backup traffic. That is, the bottleneck is near fannie, and unrelated to zimmermann.

When fannie reaches out to zimmermann for a keyserver update, the packets for that HKP session get squeezed in between the massive volumes of backup traffic contending for the small pipe. This means Fannie's HKP requests sometimes get delayed (packets dropped or re-sent) even in the middle of a request. Since zimmermann's sks instance is blocked by the beginning of the request until the request can be processed and returned, fannie's network congestion results in a delay/denial of service on zimmermann, even though zimmermann's connection to the 'net is uncongested.

Does this make sense?

comment:8 Changed 7 years ago by Daniel Kahn Gillmor

As noted in #3758, i've just disabled nginx's caching entirely, hoping to reduce the turnaround time between uploading new keys and having them be fetchable.

If we notice zimmermann hogging CPU on that host, we should think about re-enabling the cache (maybe with shorter TTL ?)

comment:9 Changed 6 years ago by Daniel Kahn Gillmor

Keywords: zimmermann.mayfirst.org added; zimmermann removed

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.