wiki:debug-server-to-server-connections

Version 8 (modified by Jamie McClelland, 9 years ago) ( diff )

--

Debug Server to Server connections

Our servers often need to ssh into each other to carry out various tasks. Most commonly:

  • Each server has to rsync, over ssh, to it's designated backup servers
  • Each server copies data to jojobe, our nagios server, so we can get alerts if anything is amiss

These connections are configured using the monkeysphere, specifically:

  • Each server generates an OpenPGP key and corresponding authentication subkey
  • Each server runs a ssh agent via runit (/etc/sv/ssh-agent-root) that keeps the authentication subkey loaded in memory so it can use it to access remote servers
  • Each remote server is configured with the User Ids of the root OpenPGP keys that should be able to access it.

Unfortunately, sometimes things go wrong and servers are not able to connect to each other.

In these examples I refer to the "connecting" server and the "target" server to distinguish between the two.

Here are the top causes for the failures, and the remedies. Note: you may need to repeat the first one after fixing the problem with one of the later steps. A failure to connect sometimes seem to kill the ssh-agent.

  • The connecting server does not have SSH_AUTH_SOCK set. This is set in ~/.profile. If you are debugging, you may need to source this file. If you are scripting, be sure to manually set the environment variable: SSH_AUTH_SOCK=/root/.ssh-agent-socket/sock. Test on the connecting server with:
    echo $SSH_AUTH_SOCK
    
    If you get no output, source the ~/.profile file:
    source ~/.profile
    
    and try again.
  • Something went wrong with ssh-agent on the connecting server. Check for the existence of the socket file. If it's not there, fix: Stop and restart the service on the connecting server:
    ls -l /root/.ssh-agent-socket
    sv stop ssh-agent-root
    sv start ssh-agent-root
    
  • The target server does not have the latest version of the connecting server's OpenPGP key. Fix on the target server: refresh the key, reload the credentials, and test:
    monkeysphere-authentication refresh-keys <username>
    monkeysphere-authentication update-users <username>
    cat /var/lib/monkeysphere/authorized_keys/<username>
    

Note: The last cat command must produce a file with the connecting server's key or it will never work.

  • The connecting server has not published the latest version of it's key. Fix on the connecting server: determine the keyid of the server's secret key, and then publish it:
    gpg --list-secret-key
    gpg --keyserver keys.mayfirst.org --send-key <keyid>
    
    Then, refresh the key on the target server (see above).
  • The connecting server's OpenPGP key is expired. Fix on the connecting server: extend it:
    mf-gpg-extend-root-expiration
    
    (This will also publish it). Then, refresh the key on the target (see above)
  • The connecting server's has not been certified by an allowed key (or the certification has expired). Fix: On the connecting server, refresh the key's certifications:
    gpg --recv-key <keyid>
    
    Then, list the certifications:
    gpg --check-sigs <keyid>
    
    Then, on the target server, see if any of them match the allowed certifiers:
    monkeysphere-authentication list-id-certifiers
    
    If not, get someone on the allowed list to sign the key, then run the step for ensuring the target server has the lastest version of the connecting servers OpenPGP key.
Note: See TracWiki for help on using the wiki.