Opened 4 years ago

Closed 4 years ago

#8433 closed Bug/Something is broken (fixed)

Email connections issues for Rutgersaaup.org

Reported by: https://id.mayfirst.org/slam Owned by: https://id.mayfirst.org/srevilak
Priority: Urgent Component: Tech
Keywords: roundcube, email, rose.mayfirst.org, courier, imap Cc: jamie@…
Sensitive: no

Description

Hi.

Three of my rutgersauup.org members have email related issues.

  1. User tries to login to roundcube, enters username and pass, and is dumped back to the login screen. This happens up to ten times in a row some days, or there's no issue at all for days.
  1. User has a problem with sent messages appearing in drafts, appearing in outbox even after being sent (and appear in the sent folder), computer freezing in the middle of composing emails, other strange app behaviors.
  1. Last is just reporting "problems" with email.

I know this is pretty vague but I thought some obvious answer might occur to you, like a known problem on rose or something. Meanwhile I'll keep tracking down more details. My own access to rutgersaaup.org emails is perfect. If you have any other suggestions for narrowing down this issue let me know. Thanks!

Change History (54)

comment:1 Changed 4 years ago by https://id.mayfirst.org/dskallman

  • Keywords roundcube email added
  • Owner set to https://id.mayfirst.org/dskallman
  • Status changed from new to assigned

Hi Scott,

Have you looked at what browser & OS is being used? Also, are the emails being accessed only by webmail or are they using an email client too? Lastly, how many messages are in the inbox, if it's large it can be slow and cause issues at times.

Outside of that, not sure what else to share without more for us to look into. If there is a specific email to look at we can see if there are any specific issues with that account.

Thanks,

Dana

comment:2 Changed 4 years ago by https://id.mayfirst.org/slam

User 1 is on Firefox on MacOS and does not use an email client. She's the one with the need to login multiple times.

User 2 is on Windows with MS Office 2013, so her email client is Outlook.

User 3 is not responding to my emails...

I'll get back with info on inbox counts. User 1, however, has a new account so I expect it has less than 200-300 messages.

comment:3 Changed 4 years ago by https://id.mayfirst.org/dskallman

Do you know what version of MacOS & what version of FireFox? And if so, are you able to test that for them?

What version of Outlook? Can you confirm if the server settings are set to mail.mayfirst.org?

Yeah, inbox size is likely not an issue.

comment:4 Changed 4 years ago by https://id.mayfirst.org/slam

Are there logs of email access attempts? As site admin where do I find them?

My user just reported she attempted to login to Roundcube eight times and failed. The ninth time was successful. Same user and pass each time.

I have not been able to replicate this issue myself.

Site: rutgersaaup.org Username: swolf Access attempt: Feb 27 2014 between 9:30 and 10:30 AM.

~slam

comment:5 Changed 4 years ago by https://id.mayfirst.org/dskallman

  • Owner changed from https://id.mayfirst.org/dskallman to https://id.mayfirst.org/srevilak

We can see if messages have been delivered, but this is logging in. I'm looping Steve in who manages our Roundcube instance to see if he has any ideas on what it could be.

comment:6 Changed 4 years ago by https://id.mayfirst.org/srevilak

Hello Slam,

With regard to User 1, my first suggestion would be to have them clear their browser cache, and any mayfirst.org cookies. If that doesn't help,

  • I gather that user 1's username is "swolf". I'll see if I can find logging that mentions the failed access attempt.
  • If clearing cache doesn't make a difference for user 1, could you ask the person to try https://roundcube.dev.mayfirst.org. roundcube.dev is our "development" roundcube server; it happens to be running a release candidate for the next version of roundcube. I'm interested in knowing if the login failures appear in one (or both) versions of roundcube.

If you can provide user 2's username, I'll look into that as well.

Steve

Last edited 4 years ago by https://id.mayfirst.org/srevilak (previous) (diff)

comment:7 Changed 4 years ago by https://id.mayfirst.org/slam

Hi,

User 1 is swolf. Clearing caches didn't help. I'm very interested in your login logs, since that will tell me more if it's at her end or on the MF server.

User 2 is karent42

I am adding User 4 to the complaints list. Username cstanford is having the same issues with RoundCube logins. She sends me the following error message from her browser:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, apache@mayfirst.org and inform them of the time the error 
occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

cstanford also sends me the following info:

"I am having problems quite frequently with getting access to my work email through the Mayfirst Roundcube remote access. Sherry told me that changing her password a while back worked to make things better. About that same time, Galina helped me change my password; it is a robust one. I don't know that that has anything to do with my issue. This type of internal server error happens a lot. Sometimes I can go to "sent" email and, then, go back to the inbox and then my messages are there and I can read and answer. Sometimes even this fails."

This appears to be a MF issue now that I have two users with the same issue, on two different laptop, connecting from two different IP addresses.

comment:8 Changed 4 years ago by https://id.mayfirst.org/srevilak

Slam,

Thanks for the info; I'll let you know what I find.

Steve

comment:9 Changed 4 years ago by https://id.mayfirst.org/slam

Any news Steve? Users are still complaining about poor access to Roundcube.

From user cstanford

Hi Scott, I am having a lot of problem accessing email through the Roundcube remote access. I was having that same problem I mentioned of not being able to see emails in my inbox, so I logged out and waited a couple of minutes before logging in again. This time I got the "internal server error" so that I could not get in at all. That happened to day at 3:55pm. I had switched from Firefox to Chrome, thinking it might work better with a different browser. This is the error message:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, apache@… and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

comment:10 Changed 4 years ago by https://id.mayfirst.org/slam

  • Priority changed from Medium to High

Just found a second message from user swolf 3:35 PM March 10 2014

swolf: can't get into my email again, now.

comment:11 Changed 4 years ago by https://id.mayfirst.org/slam

  • Priority changed from High to Urgent

None of the their staff have been able to access email for more than a minute for the last hour.

Just tried both Roundmail and Horde myself and I have no access.

Bumping this up to Urgent.

comment:12 Changed 4 years ago by https://id.mayfirst.org/slam

As per your instructions I have both Cathy and Sherry attempt to login to Horde and the Roundcube.dev test site as well. No-go for both. From Cathy:

Hi Scott, Using Chrome, I tried using the new version of Roundcube being developed. This time I was trying from my desktop computer (rather than my laptop, which I used with the previous error message notice I sent to you earlier today. First issue: the system said it was not a secure https:// site, but I decided to go ahead anyway, when the system asked me if I wanted to proceed. I got the same error code and I am unable to login at all:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, apache@… and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log. Apache/2.2.16 (Debian) Server at roundcube.dev.mayfirst.org Port 443

comment:13 Changed 4 years ago by https://id.mayfirst.org/dskallman

  • Cc jamie@… added

Linking Jamie to ticket, so he's in the loop too. I think this issue is related to #8635 and related tickets from last week.

comment:14 Changed 4 years ago by https://id.mayfirst.org/jamie

I don't think this is related to #8635. However, it might be related to #7091 (which would caused by disk contention on julia). For the user slam, I do seem some TIMEOUTs and LOGIN failed between 4:00 pm an 5:00 pm New York time.

I just banned two IP addresses that were doing what appear to be dictionary login attempts against two sites on julia, which seem to have made an impact. we'll have to continue monitoring julia tomorrow to make sure we don't see more resource contention.

Thanks for your patience with this - I know how frustrating it is to not have access to your email!!

jamie

comment:15 Changed 4 years ago by https://id.mayfirst.org/srevilak

Hello Slam,

Thanks for the nudge, and I apologize for not following up on this sooner.

Here's one of the errors from swolf

[Mon Mar 10 16:04:52 2014] [error] [client 64.19.x.y] FastCGI: comm with server "/srv/roundcube-php" aborted: idle timeout (30 sec), referer: https://roundcube.mayfirst.org/
[Mon Mar 10 16:04:52 2014] [error] [client 64.19.x.y] FastCGI: incomplete headers (0 bytes) received from server "/srv/roundcube-php", referer: https://roundcube.mayfirst.org/

Basically, that's a timeout (or network communications) error. Roundcube is an imap client program that runs on a machine called stallman, and it makes imap calls to rose.mayfirst.org, which is where @rutgersaaup.org mail is hosted.

I tried correlating this with stuff happening on rose, and found the following:

Mar 10 16:02:37 rose imapd: 60 maximum active connections.
Mar 10 16:04:01 rose imapd: 60 maximum active connections.
Mar 10 16:05:05 rose imapd: 60 maximum active connections.
Mar 10 16:06:36 rose imapd: 60 maximum active connections.

In other words, there were so many active imap connections that rose just said "nope, no more", and you saw internal server errors as a result.

Let me see if it's feasible to increase imapd's connection limit on rose.

comment:16 Changed 4 years ago by https://id.mayfirst.org/srevilak

For reference, a little historgram of imapd "maximum active connections" (the third column is hour of the day)

0 rose:/var/log# grep "60 maximum active connections" /var/log/mail.err | cut -f1 -d: | sort | uniq -c
      9 Mar 10 11
      2 Mar 10 12
     12 Mar 10 14
     28 Mar 10 15
     22 Mar 10 16
      3 Mar 10 18
0 rose:/var/log#

We hit max connections 62 times between 14:00 and 16:59.

This also happened

  • 13 times on March 7th
  • 31 times on March 5th
  • 32 times on march 4th
  • 5 times on March 3rd

comment:17 Changed 4 years ago by https://id.mayfirst.org/slam

Srevilak, this is awesome news. I was just reading your old threads on #7091 and once i realized the local hosting order server could have an impact on mail service, I immediately figured the problem was with Rose. We've had a lot of issues being hosted on Rose.

I would not mind moving to a less impacted server, or upgrading to a VPS. I've already discussed a VPS with the Union and they are prepared to move in that direction. If a VPS will relieve some of the access issues we've been having with mail and web I will push for this change sooner rather than later.

Thanks for getting back. It wasn't a crisis until people could not log on at all.

Two questions: what's a MOSH? And, why are the IMAP requests maxed? Could these be tracked back to the dictionary attacks Jamie mentions?

comment:18 Changed 4 years ago by https://id.mayfirst.org/srevilak

Rose has been pretty busy lately. It sounds like you've been seeing load-related issues, and moving to a VPS should definitely alleviate that.

A "MOSH" is our standard virtual server configuration for member hosting orders. No one really remembers what the acronym stands for, but it stuck with us. :)

I believe the dictionary attack was targeting a wordpress instance on rose (http rather than imap). However, dictionary attacks can have the effect of slowing down an entire machine, and that may have contributed to a backlog of imap connections. I'm not sure.

That said, some of our moshes have higher imapd connection limits.

Jamie: as a short term measure, how would you feel about making this change to rose?

diff --git a/manifests/nodes/production/rose.pp b/manifests/nodes/production/rose.pp
index 6908fe0..e06e88b 100644
--- a/manifests/nodes/production/rose.pp
+++ b/manifests/nodes/production/rose.pp
@@ -9,7 +9,9 @@ node "rose.mayfirst.org"  {
     backup_rsync_target => "ali.mayfirst.org",
     caching_dns_ips => [ "216.66.22.48", "216.66.23.36" ]
   }
-  class { "m_mosh": } 
+  class { "m_mosh": 
+    courier_imap_maxdaemons => 80 
+  } 
   m_monkeysphere::publish_server_keys { rose: }
   m_gpg::publish_user_key { "root": keyserver => $mfpl_keyserver }

comment:19 Changed 4 years ago by https://id.mayfirst.org/jamie

Hi all - I agree, Steve, with your puppet changes - please push those out. And thanks for catching the max imap processes being reached - I missed it in my analysis.

As for MOSH - it doesn't stand for anything. We chose it after realizing that we couldn't describe what a MOSH is with just a few words so we gave up an used a fun to pronounce name.

As for the dictionary attacks - I just created #8669 as a possible way to address them more systematically. I'm a bit pressed this week - I'm hoping someone else on the support team might have time to take a crack at it.

comment:20 Changed 4 years ago by https://id.mayfirst.org/jamie

And, at the moment at least, it looks like julia is behaving quite well, so I'm hoping that stopping those dictionary attacks has resolved the problem.

comment:21 Changed 4 years ago by https://id.mayfirst.org/srevilak

Jamie,

Thanks for weighing in. I'll push a new puppet tag this evening; hoping to start ~ 7:30pm.

Steve

comment:22 Changed 4 years ago by https://id.mayfirst.org/srevilak

Pushed puppet tag mfpl-puppet-2.16. Cron should pick this up within the hour.

comment:23 Changed 4 years ago by https://id.mayfirst.org/srevilak

Change deployed.

0 rose:~# grep MAXDAEMONS= /etc/courier/imapd
MAXDAEMONS=80
0 rose:~# 

comment:24 Changed 4 years ago by https://id.mayfirst.org/srevilak

  • Keywords rose.mayfirst.org courier imapd added

comment:25 Changed 4 years ago by https://id.mayfirst.org/srevilak

  • Keywords imap added; imapd removed

comment:26 Changed 4 years ago by https://id.mayfirst.org/slam

Another login error today. Account swolf time approx 12:20 PM +/- 15 min. Can you check and see if we're still maxing out IMAP on Rose?

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, apache@mayfirst.org and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Apache/2.2.16 (Debian) Server at roundcube.mayfirst.org Port 443
Last edited 4 years ago by https://id.mayfirst.org/slam (previous) (diff)

comment:27 Changed 4 years ago by https://id.mayfirst.org/ross

Hi slam,

Looking at the logs, it appears that the problem was a civicrm configuration issue:

[Thu Mar 13 14:48:14 2014] [error] [client 209.51.180.30] client denied by server configuration: /home/members/rutgersaaupaft/sites/rutgersaaup.org/web/sites/default/files/civicrm/ConfigAndLog/CiviCRM.18432da265af2a2d3e685be540257438.log
[Thu Mar 13 14:48:14 2014] [error] [client 209.51.180.30] client denied by server configuration: /home/members/rutgersaaupaft/sites/rutgersaaup.org/web/sites/default/files/civicrm/upload/xaq_fec25603e624dfbb4c70c131638abe44.csv
[Thu Mar 13 14:48:14 2014] [error] [client 209.51.180.30] client denied by server configuration: /home/members/rutgersaaupaft/sites/rutgersaaup.org/web/sites/default/files/civicrm/upload/sqlImport.duplicates
[Thu Mar 13 14:48:14 2014] [error] [client 209.51.180.30] client denied by server configuration: /home/members/rutgersaaupaft/sites/rutgersaaup.org/web/sites/default/files/civicrm/upload/xaz_36a60b3977cb42992d73ab3da4e18f2e.csv
[Thu Mar 13 14:48:14 2014] [error] [client 209.51.180.30] Attempt to serve directory: /home/members/rutgersaaupaft/sites/rutgersaaup.org/web/sites/default/files/

I don't understand what the problem was here, but it does seem to be somehow related to a configuration issue.

~/ross

comment:28 Changed 4 years ago by https://id.mayfirst.org/slam

Hi Ross,

Are these logs from the IMAP server, or from the rutgersaaup account?

How would a misconfigured CiviCRM stop someone from logging into roundcube via Firefox?

Those error do worry me, I'll try to see if I can source them, I'm just unclear how they are related to webmail.

Also what time zone are those stamps in? It's not 14:48 yet in NYC.

Last edited 4 years ago by https://id.mayfirst.org/slam (previous) (diff)

comment:29 Changed 4 years ago by https://id.mayfirst.org/ross

Ah that's my mis-interpretation slam. I didn't realize the error was from webmail.mayfirst.org. I doubt the civicrm errors would have caused this problem. It does seem that rose has something of a cpu bottleneck. I think we need to allocate more cpu cores to rose. I've now done this on it's host, but we'll need to schedule a reboot for rose.

comment:30 Changed 4 years ago by https://id.mayfirst.org/slam

Interesting about those CiviCRM errors too. Those files are all old imports from a week ago. I have no idea why Drupal or CiviCRM would be trying to serve those files, but Apache is correct to deny serving them if it's to the public. I have no idea why CiviCRM would be trying to access those files now; no one is trying to work with the imported csv files this week.

comment:31 Changed 4 years ago by https://id.mayfirst.org/slam

Thanks for the reconfig on Rose.

Could you also check the IMAP logs around that time and see if we are still maxing out? See https://support.mayfirst.org/ticket/8433#comment:15

comment:32 Changed 4 years ago by https://id.mayfirst.org/ross

Good catch...It does look like max connections were reached, and it also seems that the configuration options that Steve made did not take for some reason: (note the timestamps here are in UTC I believe.

0 rose:/var/log# grep "maximum active connections" mail.log
Mar 13 15:52:38 rose imapd: 60 maximum active connections.
Mar 13 15:53:55 rose imapd: 60 maximum active connections.
Mar 13 15:56:58 rose imapd: 60 maximum active connections.
Mar 13 15:58:02 rose imapd: 60 maximum active connections.
Mar 13 15:59:06 rose imapd: 60 maximum active connections.
Mar 13 16:01:55 rose imapd: 60 maximum active connections.
Mar 13 16:02:57 rose imapd: 60 maximum active connections.
Mar 13 16:04:25 rose imapd: 60 maximum active connections.
Mar 13 16:06:21 rose imapd: 60 maximum active connections.
0 rose:/var/log# 

I see MAXDAEMONS=80 in /etc/courier/imapd so it appears that the settings file got modified. I'm restarting postfix and courier-imap, courier-imap-ssl and courier-pop3* to see if this deploys the changed configuration file.

comment:33 Changed 4 years ago by https://id.mayfirst.org/jamie

Hm. It seems to me that the configuration change did take, but we have now reached 80 connections:

0 rose:~# ps -eFH | grep imap | grep -v root | wc -l
81
0 rose:~#

I ran mf-imap-usage-report}

And found one user with 27 open connection from 11 different IP addresses. This seems to have been a problem before on julia (see #6282). Then it was a verizon connection, this one is a sprint connection. Alas - there is no solution in that ticket.

I'm going to follow up with an email to the user in question to see if they can modify their email configuration to reduce their usage of connections. In the meantime, I am temporarily increasing the max daemons to 120.

jamie

comment:34 Changed 4 years ago by https://id.mayfirst.org/jamie

See also #4747.

comment:35 Changed 4 years ago by https://id.mayfirst.org/slam

Jamie, thanks for that first command in comment 33. That's something I can run as a user to help diagnose the problem on my own.

Tomorrow I'll spot check Rose's IMAP connections and report back if it nears 120.

comment:36 Changed 4 years ago by https://id.mayfirst.org/jamie

Great - you can run the ps command as a non-privileged user exactly as written, however, to run mf-imap-usage-report, you'll have to specify the full path: /usr/local/sbin/mf-imap-usage-report.

Also, I spoke to the user with the most connections and he is working on re-configuring his mail clients to use fewer connections.

jamie

comment:37 Changed 4 years ago by https://id.mayfirst.org/jamie

oh ... sorry, mf-imap-usage-report counts numbers of open IMAP connections, but won't tell you the connections made in the last hour and the IP addresses (it will report no connections in the last hour because you don't have read-access to the mail logs). Nonetheless, it does give you a break down of who is connected and how many connections.

comment:38 follow-up: Changed 4 years ago by https://id.mayfirst.org/slam

  • Priority changed from Urgent to Medium

Thanks Jamie! /usr/local/sbin/mf-imap-usage-report works for me and is another great tool I can use to troubleshoot at the user level and assist the MF sysadmins.

Q1: is this report for the current server (Rose) only or for mail.mayfirst.org?

Q2: It says "Top five users with the most currently open connections and the IPs they connected from in since 1 hour ago. but also reports 4 bjwalker ( no logins in last hour) . Seems contradictory? What is it measuring exactly?

bjwalker is one of my users — I can look into her configuration. I suspect that once they started having issues they bumped up the imap connection frequency to try to resolve it.

Q3: What do you recommend for an IMAP mail check frequency?

comment:39 in reply to: ↑ 38 Changed 4 years ago by https://id.mayfirst.org/jamie

Replying to https://id.mayfirst.org/slam:

Thanks Jamie! /usr/local/sbin/mf-imap-usage-report works for me and is another great tool I can use to troubleshoot at the user level and assist the MF sysadmins.

Q1: is this report for the current server (Rose) only or for mail.mayfirst.org?

It's for the server it is being run on. So, when run on rose, it only counts connections open for rose.

Q2: It says "Top five users with the most currently open connections and the IPs they connected from in since 1 hour ago. but also reports 4 bjwalker ( no logins in last hour) . Seems contradictory? What is it measuring exactly?

That's because the command expects you to be root, in which case it searches our mail logs to report the IP addresses the users have connected from. Since you are not running it as root, you don't have access to the logs, so it mistakenly thinks there are no logins in the last hour.

bjwalker is one of my users — I can look into her configuration. I suspect that once they started having issues they bumped up the imap connection frequency to try to resolve it.

4 connections is not a big deal - I wouldn't worry about it.

Q3: What do you recommend for an IMAP mail check frequency?

I think the check frequency is not as important as the number of cached processes. In Thunderbird, the default is 5 and you can lower it to 1. As long as you are only polling your Inbox (as opposed to automatically filtering messages to other boxes), having just one process works just fine and should limit you to at most one active IMAP connection.

hope that helps,

jamie

comment:40 Changed 4 years ago by https://id.mayfirst.org/slam

Reopening; Rose has maxed out all 80 connections again and our staff can't get into email.

0 rutgersaaup@rose:~$ /usr/local/sbin/mf-imap-usage-report 

Max Daemons allowed in /etc/courier/imapd: 80
Current daemons running: 81

Top five users with the most currently open connections
and the IPs they connected from in since 1 hour ago.

egrep: /var/log/mail.log: Permission denied
4 renee ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
5 slam-glocal ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
7 dana-glocal ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
8 devinnycp ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
10 cmdln ( no logins in last hour)
0 rutgersaaup@rose:~$ 
0 rutgersaaup@rose:~$ ps -eFH | grep imap | grep -v root | wc -l
80

comment:41 Changed 4 years ago by https://id.mayfirst.org/slam

  • Priority changed from Medium to Urgent
  • Summary changed from Problems with email at rutgersaaup.org to IMAP connections maxed on Rose

comment:42 Changed 4 years ago by https://id.mayfirst.org/ross

  • Resolution set to fixed
  • Status changed from assigned to closed

After restarting courier-imap the max connections seem to have reduced to a reasonable level. Just as precaution, I've increased rose's max daemons to 100 because I didn't see any evidence that a single user was hogging all the connections. Though it does seem that glocal folks and Devin are eating up more than their fair share :-).

comment:43 Changed 4 years ago by https://id.mayfirst.org/slam

Sorry to keep bothering you all. IMAP is still flakey on Rose, folders are slow to sync, and some of my messages take hours to arrive.

imapd is reporting 80 max connections still.

There's 88 connections at 9 PM at night so I expect the afternoon peak was over 100 (though I can't check the logs for that.)

0 rutgersaaup@rose:~$ /usr/local/sbin/mf-imap-usage-report 

Max Daemons allowed in /etc/courier/imapd: 80
Current daemons running: 88

Top five users with the most currently open connections
and the IPs they connected from in since 1 hour ago.

egrep: /var/log/mail.log: Permission denied
4 orgup ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
5 hazparito ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
5 phaklon ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
6 cmdln ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
14 liam ( no logins in last hour)

I did reduce my Thunderbird IMAP connections to 2 (I do filtering), but I also have my Android email client and MacOS and Android note apps, all of which are dependent on IMAP.

We really need a permanent solution for this problem.

I'd be happy to move my personal account to another server if that would take some load off Rose. I have hosting orders on Chelsea and Ossie. But I can't move my Rutgers Union client off Rose until we get an MF VPS.

Last edited 4 years ago by https://id.mayfirst.org/slam (previous) (diff)

comment:44 Changed 4 years ago by https://id.mayfirst.org/ross

  • Resolution fixed deleted
  • Status changed from closed to assigned

I'm looking at the logs slam and seeing the following, in addition to the max connections error:

ar 26 18:04:54 rose spamd[5721]: prefork: server reached --max-children setting, consider raising it
Mar 26 18:04:55 rose spamd[5721]: prefork: server reached --max-children setting, consider raising it
Mar 26 18:04:55 rose spamd[5721]: prefork: server reached --max-children setting, consider raising it
Mar 26 18:04:56 rose spamd[5721]: prefork: server reached --max-children setting, consider raising it
Mar 26 18:04:57 rose spamd[5721]: prefork: adjust: 3 idle children more than 2 maximum idle children. Decreasing spamd children: 20540 killed.
Mar 26 18:04:57 rose spamd[5721]: prefork: adjust: 3 idle children more than 2 maximum idle children. Decreasing spamd children: 4262 killed.
Mar 26 18:04:57 rose spamd[5721]: prefork: adjust: 3 idle children more than 2 maximum idle children. Decreasing spamd children: 808 killed.

I wonder if spamassasin might be slowing down mail delivery.

comment:45 Changed 4 years ago by https://id.mayfirst.org/ross

The spamd issue could certainly help explain some of the reports of increased spam. I went ahead and increased the child limit manually from 5 to 10. We'll see if this makes a difference. I did this by changing the line in /etc/default/spamassassin

OPTIONS="--create-prefs --max-children 10 --helper-home-dir"

And in addition I added MAXDAEMON and MAXPERIP settings to /etc/courier/imapd-ssl. It's not clear that such a thing is necessary but I read a random report that it improved one person's performance and the cofnfig file says "Go ahead and add em'"...so I did. Plan to discuss this with jamie in the morning.

~/ross

Last edited 4 years ago by https://id.mayfirst.org/ross (previous) (diff)

comment:46 Changed 4 years ago by https://id.mayfirst.org/slam

Rose is maxed out again, my users can't access their email, and we have a CiviMailing that is stuck in the queue, so 5,000 people who are supposed to get a union campaign action request are not getting their message.

0 rutgersaaup@rose:~/rutgersaaup.org/web/sites$ /usr/local/sbin/mf-imap-usage-report 

Max Daemons allowed in /etc/courier/imapd: 100
Current daemons running: 105

Top five users with the most currently open connections
and the IPs they connected from in since 1 hour ago.

egrep: /var/log/mail.log: Permission denied
5 madeofpeople ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
6 bjwalker ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
7 josswinnweb ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
11 cmdln ( no logins in last hour)
egrep: /var/log/mail.log: Permission denied
14 calum ( no logins in last hour)

Can we get a permanent solution to this Rose IMAP overload? Maybe migrate some users? I'm trying to get rutgersaaup.org off Rose, and that might help. VPS is ready but we need assistance migrating.

comment:47 Changed 4 years ago by https://id.mayfirst.org/ross

Hi slam,

Given taggert's suggestion about iptables, I've gone ahead and implemented a couple of iptable rules that should keep the number of single ip connections controlled. Here are the rules

iptables -A INPUT -p tcp --syn --dport 143 -m connlimit --connlimit-above 6 -j REJECT --reject-with tcp-reset
0 rose:~# iptables -A INPUT -p tcp --syn --dport 25 -m connlimit --connlimit-above 6 -j REJECT --reject-with tcp-reset
0 rose:~# iptables -A INPUT -p tcp --syn --dport 993 -m connlimit --connlimit-above 6 -j REJECT --reject-with tcp-reset

They say do not allow more than 6 connections from the same ip address on ports 143, 25, or 993. I think this should be safe for our mail relay servers, but it keeps the max number of connections from others to a reasonable number.

~/ross

comment:48 Changed 4 years ago by https://id.mayfirst.org/jamie

Hey Ross - courier has a MAXPERIP setting (set to 20). And you can set that in puppet (see viewsic). Unless that is not working, I think it would be a better way to limit than using iptables.

We really need a limit per user though, or (as dovecot provides) limit per user-ip combination (that way we don't penalize people who are sharing a NAT address).

comment:49 Changed 4 years ago by https://id.mayfirst.org/slam

  • Summary changed from IMAP connections maxed on Rose to Email connections issues for Rutgersaaup.org

comment:50 Changed 4 years ago by https://id.mayfirst.org/slam

OK. So we've switched to a VPS and we STILL have problems connection. I was told a switch to VPS would solve this connection issue, but it isn't. Lewis shows only 5 of 60 connections in use.

Rutgersaaup.org users are having intermittent connection problems to email. Outlook, webmail, and other apps are affected.

I just tested this with user swolf at 4:10. Using her name and password I was able to log in once to webmail.mayfirst.org. The next two connections met with a failure: MessageLogin failed because your username or password was entered incorrectly.

User deniseb using Outlook. Password box pops up intermittently, but entering the password only works sometimes. Access can be denied for an hour or more, and then it will work for a short while. Errors include "mayfirst.org is in offline mode," and "server not connected" (not exact errors). Server is mail.rutgersaaup.org:993 SSL is on.

User raet using Outlook. Password pops up constantly. Has been able to get on sometimes, but only after the password window pops up again. Cannot send email out. Outgoing server is mail.mayfirst.org: 587. Authentication on, secure authentication off.

User bjwalker using Outlook. Is receiving email, but the password dialog pops on her desktop and her phone repeatedly. Has no problem sending mail. No other errors reported. Incoming and outgoing email settings are correct.

Just noting we are now over two months on this issue.

Basically I am on a phone call right now with some really pissed off people trying to get the info we need to hunt down this mail server problem.

comment:51 follow-up: Changed 4 years ago by https://id.mayfirst.org/jamie

This should be working more reliably now.

I discovered that on gil (but not on paulo), the cron daemon was not running. it doesn't seem to have been running since April 2 - which means that no updates to the user accounts were being made. Since rutgers was moved to lewis since then, gil was still directing login attempts to rose (hence the intermittent problems since paulo was doing it properly.

I'm not sure why cron failed.

comment:52 in reply to: ↑ 51 Changed 4 years ago by https://id.mayfirst.org/srevilak

  • Resolution set to fixed
  • Status changed from assigned to feedback

Slam, has Rutgersaaup.org mail been working alright since Jamie's fix in Comment 51?

comment:53 Changed 4 years ago by https://id.mayfirst.org/slam

  • Resolution fixed deleted
  • Status changed from feedback to assigned

I've had no issues or complaints myself or from users.

Thanks again Jamie!

comment:54 Changed 4 years ago by https://id.mayfirst.org/jamie

  • Resolution set to fixed
  • Status changed from assigned to closed

Great -glad things are working better now.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.