Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#9848 closed Bug/Something is broken (fixed)

postfix NOQUEUE

Reported by: https://id.mayfirst.org/erq Owned by: https://id.mayfirst.org/jamie
Priority: Urgent Component: Tech
Keywords: mx1-email Cc: ross@…
Sensitive: no

Description

while trying to receive email mx1 is recording lines line this ones in the mail.log file

Jul 30 08:01:46 mx1 postfix/smtpd[18372]: NOQUEUE: reject: RCPT from gil.mayfirst.org[216.66.23.48]: 451 4.3.5 Server configuration problem; from=<erq@mayfirst.org> to=<enrique@laneta.apc.org> proto=ESMTP helo=<gil.mayfirst.org>
...
Jul 30 08:04:32 mx1 postfix/smtpd[19628]: NOQUEUE: reject: RCPT from mail-ob0-f172.google.com[209.85.214.172]: 451 4.3.5 Server configuration problem; from=<enrroquez@gmail.com> to=<enrique@laneta.apc.org> proto=ESMTP helo=<mail-ob0-f172.google.com>

Could you please help me diagnose what is wrong?

Thanks Enrique

Change History (6)

comment:1 Changed 4 years ago by https://id.mayfirst.org/dskallman

  • Cc ross@… added
  • Owner set to https://id.mayfirst.org/jamie
  • Status changed from new to assigned

Looping Jamie & Ross in.

comment:2 Changed 4 years ago by https://id.mayfirst.org/erq

Hi (I just got to the office), I think people maybe loosing email since error messages are still appearing

Last edited 4 years ago by https://id.mayfirst.org/erq (previous) (diff)

comment:3 Changed 4 years ago by https://id.mayfirst.org/jamie

I'm looking at this now... the messages should be queuing up so they will be delivered once we sort it out.

comment:4 Changed 4 years ago by https://id.mayfirst.org/jamie

  • Resolution set to fixed
  • Status changed from assigned to closed

The problem is that postgrey was killed by oom-killer this morning at 6:49 am (NY time). I just re-started it and mail seems to be flowing again.

comment:5 Changed 4 years ago by https://id.mayfirst.org/erq

thanks Jamie, could you provide more reference to oom-killer, sorry I'm not familiar with it

comment:6 Changed 4 years ago by https://id.mayfirst.org/jamie

Yes - sorry for the short-hand. "oom-killer" stands for out of memory killer. When the server runs out of memory, it will start killing processes to free up memory. I'm not entirely sure how it chooses a process to kill, but I think it's based on a combination of a process using a lot of memory and one that doesn't seem critical to the functioning of the server.

The answer is usually to figure out why there was a surge in memory usage and try to limit that. Given the time (and your earlier references to rdiff-backup), I suspect that it's a combination of the backup using disk I/O, which slows down the server in general. Then, a lot of traffic to web sites (given the hour, probably search bots). Since it takes longer for each apache process to respond (thanks to the disk I/O), apache needs to start more and more processes to meet the demand, which then eats up the memory.

Right now your max apache clients, I think is 150 (based on line 105 of /etc/apache2/apache2.conf).

You could lower that to 125 by adding the file: /etc/apache2/conf.d/maxclients.conf with the contents:

MaxClients 125

The downside is that if you hit 125 clients without using up the memory, apache will refuse connections.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.