Opened 11 years ago

Closed 11 years ago

#187 closed Bug/Something is broken (fixed)

Spam capture ratios have dropped significantly: Switch to debian volatile

Reported by: alfredo Owned by: Jamie McClelland
Priority: Urgent Component: Tech
Keywords: Cc:
Sensitive: no

Description

The ratio of spam being caught by SA (on at least Chavez) has definitely dropped significantly in the last week or so. I used to capture about 50 to 60 spam messages during my first login inside app 100 messages. I am now capturing about 10 to 15 every day. And that 25 percent of the past ratio remains the same for all my sessions during the day.

Is there something we dropped during the upgrades that could have this effect?

Change History (14)

comment:1 Changed 11 years ago by Jamie McClelland

Status: newassigned
Summary: Spam capture ratios have dropped significantlySpam capture ratios have dropped significantly: Switch to debian volatile

I've noticed this as well, seeing much more undetected spam.

I just changed the title. Debian volatile is a debian repository for things like spamassassin - it ensures that we get the most update to date versions available. Last I check (a while ago), it wasn't available for Etch (our version of Debian) - but now it is. And - I imagine our spam assassin has fallen out of date.

This should be done on our three shared servers (viewsic, chavez, and malcolm) as well as leslie (the list server).

comment:2 Changed 11 years ago by Jamie McClelland

Priority: HighUrgent

Boosting to urgent - so it's on my plate for tomorrow.

comment:3 Changed 11 years ago by alfredo

We should let members know we've identified and addressed this issue. Suggestions?

comment:4 Changed 11 years ago by Jamie McClelland

Bad news on this front.

I added the following lines to /etc/apt/sources.list on malcolm, viewsic, chavez, and leslie:

# Added by jamie 2007-10-30 to bring in most recent spamassassin and clamav
deb http://volatile.debian.org/debian-volatile etch/volatile main
deb http://volatile.debian.org/debian-volatile etch/volatile-sloppy main

And was able to pull in new versions of tzconfig and clamav. However, there are no newer versions of spamassassin in Debian volatile.

I checked the mailing list and found these two messages (which seem to be the latest on the subject) - the last one from 10 days ago:

http://lists.debian.org/debian-volatile/2007/10/msg00008.html and http://lists.debian.org/debian-volatile/2007/10/msg00014.html

Seems as though the person responsible for uploading the files to debian volatile has their hands full.

Here's where we are in terms of versioning:

Backports vs. Volatile? Seems like volatile is a better bet because they manage a smaller number of packages and seem to have tighter connection to the security team. However, I think I should install spamassassin from backports until it makes its way to volatile. Any objections?

comment:5 Changed 11 years ago by Jamie McClelland

I just installed spamassassin on chavez from backports. I had the following errors when restarting:

0 jm@chavez:plugins$ sudo /etc/init.d/spamassassin start
Starting SpamAssassin Mail Filter Daemon: [30973] warn: Subroutine new redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 69.
[30973] warn: Subroutine _get_images redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 194.
[30973] warn: Subroutine image_named redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 231.
[30973] warn: Subroutine image_count redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 247.
[30973] warn: Subroutine pixel_coverage redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 263.
[30973] warn: Subroutine image_to_text_ratio redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 279.
[30973] warn: Subroutine image_size_exact redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 301.
[30973] warn: Subroutine image_size_range redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 317.
[30973] warn: Subroutine result_check redefined at /usr/share/perl5/Mail/SpamAssassin/Plugin/ImageInfo.pm line 344.
[30973] warn: netset: cannot include 127.0.0.1/32 as it has already been included
spamd.
0 jm@chavez:plugins$

The ImageInfo errors I fixed by removing all references to the local copy of ImageInfo that we were using because it previously was not included in the default package. I fixed the netset error by removing 127.0.0.1/32 from the trusted_networks line in local.cf (apparently it is included by default now).

In addition, I removed our cron job that runs sa-update on a daily basis because that option is now available via /etc/default/spamassassin.

I'm not working on doing these steps on malcolm, viewsic, and leslie.

comment:6 Changed 11 years ago by Jamie McClelland

Resolution: fixed
Status: assignedclosed

I just finished this upgrade on malcolm, viewsic, and leslie.

I'm not sure whether to keep us on backports or volatile. I'm closing this ticket because I think the problem is solved.

comment:7 Changed 11 years ago by alfredo

Resolution: fixed
Status: closedreopened

No spam is being flagged on Chavez today. Looks like this fix didn't take or something else is wrong. :( We'll need to issue an advisory for all members on this as we, hopefully, get it fixed.

comment:8 Changed 11 years ago by Jamie McClelland

Can you double check that Alfredo? I've had dozens of messages tagged and relatively few spam messages come through. As far as I can tell this is fixed.

When you say no spam is being tagged I get the impression that none of your email has the X-Spam-Flag: YES header in it. Is that what you mean?

comment:9 Changed 11 years ago by alfredo

Out of the 150 or so emails I've received today thus far, one has been flagged as spam - at 3:25 pm today. Nothing else. Obviously, this is very abnormal behavior. SA is definitely tagging everything but the weight it is giving messages simply isn't stopping spam into my email account. There is no way that one message out of 150 (or even more) in a 24 hour period is normal, functional spam filtering.

If this is unique to me, then we can live with it but my hunch is it has to be affecting many other people.

comment:10 Changed 11 years ago by Jamie McClelland

Hm - I wonder if your bayes or autowhite list settings have gotten borked.

When you look at the headers of the undetected spam messages, it will list the names of the tests that were flagged. If you see a lot of BAYES_00 or BAYES_10 and/or AUTOWHITELIST it may mean that your personal settings got screwed up somehow.

In that case: Can you try ssh'ing into chavez with the user alfredo (and use your email password).

Them - try moving your .spamassassin folder to .spamassassin.off:

mv .spamassassin .spamassassin.off

And see if that makes a difference. If it doesn't you can always move it back in there.

comment:11 Changed 11 years ago by alfredo

After noticing the BAYES_00 entry in my lists being checked line on a lot of emails, I did the mv command as suggested. I see no impact yet but we'll leave it this way overnight and I'll report on what's what tommorrow afternoon.

comment:12 Changed 11 years ago by Jamie McClelland

Ok - let's see how that works. When bayes_auto_learn is enabled (as it is by default) and an incoming message scores less than .1 (by default), spamassassin will feed it into its bayesian system as ham. If the score is more than 12 (by default), it will feed it into its bayesian system as spam.

The bayesian system breaks the message into small "tokens" (usually word-size pieces of the message) and, if the messages is marked as spam it will associate those tokens with spam. If the message was tagged as ham, it will associate them with ham.

Then, when a new message arrives, one of the spamassassin tests is to compare the new messages with the ham and spam tokens. Based on that comparison, it gives a percentage chance that the new message is spam. So BAYES_00 means that, based on the tokens, there's a 0% chance that the message is spam. BAYES_99 on the other hand, means there's a 99% chance that it's spam. And - there are other percentages in between. Each percentage either adds points to the message (if it's above 50%) or removes points.

My hunch is one of the following:

  • The bayesian system underwent some kind of change between upgrades that renders our existing tokens as far less useful.
  • Spammers have figured out how to write spam messages that manipulate common token databases
  • ahh.... I'm posting this message because I hope it's useful. But, I just discovered the real problem which I'll post in a separate message...

comment:13 Changed 11 years ago by Jamie McClelland

The problem seems to be a fix I put in place several months ago.

A while back, abh noticed that messages sent from one MFPL member to another MFPL member were getting tagged as spam. This is pretty bad. After careful examination, we learned that the Dynablock rule was pushing the message over the limit. The Dynablock rule detects if a message is being sent by a computer on a dynamic IP. Unfortunately, spamassassin has no way of detecting the difference between a computer that injects a message directly into our mail server from a dynamic IP (which should get flagged) and either:

  • A message injected via horde or squirrelmail (which include the dynamic IP of the computer posting the message) OR
  • A message injected via port 587/tls - as any of our members who use thunderbird, etc. will do.

There's a good page here:

http://wiki.apache.org/spamassassin/DynablockIssues

That presents these various problems.

I fixed the problem, rather poorly, by following the advice on that page and adding the LOCAL_AUTH_RCVD headers. That subtracts 20 points for message that say TLS and chavez in them (I didn't include the authenticated part because webmail won't have it). Not clever, but it would require a concerted spam attack targeting us to get messages through.

Or...

It would require us to start using tls for all smtp sessions, which we did a few months ago (#32).

Because of #32, any message sent to viewsic, for example, and then from viewsic to chavez will subtract 20 points. Since Alfredo (and I) have a lot of aliases that are delivered to one computer and then forwarded to another, we get a lot of messages like this. Since -20 points are subtracted they not only come through as spam, but they are put through auto-learn as well!

Still mulling over options...

comment:14 Changed 11 years ago by Jamie McClelland

Resolution: fixed
Status: reopenedclosed

This problem should now be fixed.

I took the following steps on chavez to test:

  • After much searching and head scratching, I figured out how to add a report to my email headers that tells me which rules were assigning how many points. The trick is to edit your ~/.spamassassin/users_prefs file and add:
    add_header all Report _REPORT_
    
  • I then sent myself an email via sasl authenicated smtp and noted that I was getting penalized from sending from a dynamic IP (sorbs test) but that our LOCAL_AUTH_RECV hack was compensating for it by subtracting points. This is as I expected.
  • I then added this line to the postfix main.cf file:
    smtpd_sasl_authenticated_header = yes
    
  • After reloading postfix, spamassassin stopped penalizing me for the dynamic IP address.
  • I further tested the following scenarios and both had appropriate spam test reports
    • Sent email via Horde on chavez to me (on chavez)
    • Sent email via Horde on malcolm to me (on chavez)
  • In the course of mucking around, I discovered that we no longer need to set the trusted_networks setting in spamassassin - it is done automatically.
  • Lastly, I removed our hack LOCAL_AUTH_RECV header and tested all scenarios again and it seems to be working like a charm.

Conclusions:

  • Adding the postfix header most certainly fixed the problem for people who are sending via authenticated smtp
  • I'm not sure why horde users are no longer getting punished - that may have been fixed some where else in spamasssasin
  • In any event - we were able to fix the problem by adding a standard config line in postfix and removing a hack from spamassassin

Alfredo: if you are still getting undetected spam, can you add the header preference I described above to your user_prefs file and then try to determine which rule is causing you to go over? The first few times I tested this I found that sending from my own email address was causing points to be added! If something like that is happening to you, we might want to purge your auto white list database.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.