Opened 3 years ago

Last modified 3 years ago

#11243 assigned Bug/Something is broken

problemas con el correo en mx1

Reported by: https://id.mayfirst.org/erq Owned by: https://id.mayfirst.org/jamie
Priority: Urgent Component: Tech
Keywords: mx1 mail amavis Cc:
Sensitive: no

Description

Jaime, hola buen día

Esta madrugada encontré errores de conexión en el puerto 10024 (error del antivirus amavis).

Así que paré el servidor de correo, borré los archivos de la base de datos y luego reinicié. El problema unos minutos después ha continuado.

No entiendo exactamente la causa. ¿podría ser que otro proceso en operación esté afectando a amavis, como el backup?

tendré que desconectarme ahora y no podré regresar hasta en la tarde de hoy para revisar este asunto. Por favor, ¿puedes ayudarnos?

Gracias, con un abrazo Enrique

Change History (3)

comment:1 Changed 3 years ago by https://id.mayfirst.org/erq

  • Owner set to https://id.mayfirst.org/jamie
  • Status changed from new to assigned

comment:2 Changed 3 years ago by https://id.mayfirst.org/jamie

I restarted amavis last night and again I also see that is crashed. The problem is memory - when the server runs out of memory, avamis is getting killed by the kernel.

I found a lot of times yesterday when the out-of-memory killer was invoked:

0 mx1:~/tickets/6706# zgrep oom-kill /var/log/syslog.1.gz |wc -l
121
0 mx1:~/tickets/6706#

I think the solution is to reduce memory usage on mx1 or increase the amount of allocated memory (or both).

To find out what processes are using the most memory on the server, I've written a new utility called mf-memory-profile which simply runs this command:

ps -e -o rss,cmd | awk '{print $2 " " $1}' | awk '{a[$1]+=$2;}END{for(i in a)print a[i]" "i;}'  | sort -n

At the moment is show this at the tail end:

16604 /usr/sbin/named
28112 /usr/bin/python
32448 /usr/sbin/spamd
38848 cleanup
59640 /usr/sbin/mysqld
64712 spamd
74488 smtpd
277320 /usr/sbin/clamd
480944 amavisd
1099624 /usr/sbin/apache2

Which suggests that apache is the biggest memory user.

I just changed the MaxClients setting for Apache from 150 to 100 - which means that if a web site starts getting a lot of traffic, apache will refuse more connections.

This means we are sacrificing apache connections to try to keep mail running, which I think is the right step to take.

Jaime is on vacation for the next week, however, when he returns I think we should prioritize moving laneta email users as soon as possible.

comment:3 Changed 3 years ago by https://id.mayfirst.org/erq

Thanks a lot Jamie for taking care of this issues afecting mx1 server users, mostly members in use of laneta.apc.org

I will send later, during the night a message to all of them reporting about the recent problems, the measures taken and the following steps in the process of moving out of mx1

Gracias de nuevo Enrique

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.