Opened 5 years ago

Closed 3 years ago

#3898 closed Bug/Something is broken (fixed) (drupal?) woes

Reported by: Owned by:
Priority: High Component: Tech
Keywords: drupal spam Cc:
Sensitive: no


Hosted on daza. Running Drupal 6.20.

It's been taken down a few times in the past couple of days.

dkg reported on irc:

dkg: so the site was chewing up RAM and causing swap
dkg: so i doubled the RAM allocated to that server
dkg: and then once it came back up, things were fine for a while
dkg: and then eventually, casinofreephilly started hogging all the CPU instead

This is a ticket to track as we figure out what's causing this.

Change History (35)

comment:1 Changed 5 years ago by

Also see ticket <a href="">#3824</a> for possibly related issues on the daza server.

comment:2 Changed 5 years ago by

Processes on the Casino Free website are not shutting down properly and are using a lot of resource in the process. When the other ticket referred to i/o issues, I am pretty sure that input/output situation was a result of lingering processes from your site.

Taking a site off-line is, for us, a really drastic step and we seldom do it. Problem is we can't quickly determine what is causing the resource commitment. That will require looking at your site's backend and modules and other functions it is using.

I want to ask a couple of questions:

1 -- Is the site fully optimized? Do you have all the optimization functions, like page caching, turned on?

2 -- Have any modules been added over the last month? Unless this is a pretty new site -- sorry, I don't remember right now. Something is causing those process problems and I think it's fairly new but that's just a hunch.

Could you make me a new user named "alfredo" with full admin privileges and then let me know you did that by email-- my email is alfredo@…. Don't bother setting password; the system does all that. Just make sure it's configured to notify me when my username is set up. And make me a full admin so I can look at this stuff and then pass along the authentication info to whoever takes over working on it if that happens.

We're going to need to assign a techie to look at this. Of course, with four of our techies in Africa, we're a tad short-handed and will be until they return early next week. So we're going to be limping along.

If Daza (the server) goes down again and we see lingering process from Casino Free, we may have to take the site down again. We try to be as careful with that as possible and we'll try and get some analysis going asap.



comment:3 Changed 5 years ago by

Hi Alfredo,

Totally understand and I definitely want to help you figure out what's overloading things. Frankly the site and the organization is relatively dormant at the moment (mostly just new blog entries), so if it's a module that's overloading things, we can probably figure out a way to live without it.

  1. Page caching and block caching are on, css minimizing is on, javascript minimizing is off (I think it conflicted with some of the modules)
  1. No new modules have been added in some time. Google Analytics was updated on Jan 20, Panels and Webform on Jan. 12. Six others were updated earlier in January, but none of those were new. I think the last time something new was added was on Dec. 8, when we updated Emfield.module and had to install the new media modules that go with it.

I'll create your user account now.

comment:4 Changed 5 years ago by

Thanks. Couple of things off the bat:

1 -- Site is being hit like crazy by spammers -- you have enough captcha failures to fill a library! Not that this would bring the site down but it would represent artificial usage -- i.e. not many people going to the site but lots of activity. It isn't bad enough, I don't think, to represent a denial of service. Although that *is* always possible.

2 -- Theme is returning a duplicate key error:

Duplicate entry &#039;sites/all/themes/rubik/cube/; for key 1 query: INSERT INTO
system (name, owner, info, type, filename, status, throttle, bootstrap) VALUES
 (&#039;cube&#039;, &#039;themes/engines/phptemplate/phptemplate.engine&#039;, &#039;a:13:{s:4:\&quot;name\&quot;;s:4:\&quot;Cube\&quot;;s:11:\&quot;description\&quot;;s:44:\&quot;Spaces-aware front-end theme based on Rubik.\&quot;;s:10:\&quot;base theme\&quot;;s:5:\&quot;rubik\&quot;;s:4:\&quot;core\&quot;;s:3:\&quot;6.x\&quot;;s:6:\&quot;engine\&quot;;s:11:\&quot;phptemplate\&quot;;s:11:\&quot;stylesheets\&quot;;a:1:{s:6:\&quot;screen\&quot;;a:1:{s:9:\&quot;style.css\&quot;;s:37:\&quot;sites/all/themes/rubik/cube/style.css\&quot;;}}s:7:\&quot;regions\&quot;;a:4:{s:6:\&quot;header\&quot;;s:6:\&quot;Header\&quot;;s:7:\&quot;content\&quot;;s:7:\&quot;Content\&quot;;s:4:\&quot;left\&quot;;s:4:\&quot;Left\&quot;;s:5:\&quot;right\&quot;;s:5:\&quot;Right\&quot;;}s:9:\&quot;designkit\&quot;;a:2:{s:5:\&quot;color\&quot;;a:1:{s:10:\&quot;background\&quot;;s:7:\&quot;#0088cc\&quot;;}s:4:\&quot;logo\&quot;;a:2:{s:4:\&quot;logo\&quot;;s:23:\&quot;imagecache_scale:200x50\&quot;;s:5:\&quot;print\&quot;;s:24:\&quot;imagecache_scale:600x150\&quot;;}}s:7:\&quot;layouts\&quot;;a:5:{s:7:\&quot;default\&quot;;a:3:{s:4:\&quot;name\&quot;;s:7:\&quot;Default\&quot;;s:11:\&quot;description\&quot;;s:23:\&quot;Simple one column page.\&quot;;s:8:\&quot;template\&quot;;s:4:\&quot;page\&quot;;}s:7:\&quot;sidebar\&quot;;a:5:{s:4:\&quot;name\&quot;;s:7:\&quot;Sidebar\&quot;;s:11:\&quot;description\&quot;;s:26:\&quot;Main content with sidebar.\&quot;;s:10:\&quot;stylesheet\&quot;;s:18:\&quot;layout-sidebar.css\&quot;;s:8:\&quot;template\&quot;;s:14:\&quot;layout-sidebar\&quot;;s:7:\&quot;regions\&quot;;a:2:{i:0;s:7:\&quot;content\&quot;;i:1;s:5:\&quot;right\&quot;;}}s:5:\&quot;split\&quot;;a:5:{s:4:\&quot;name\&quot;;s:5:\&quot;Split\&quot;;s:11:\&quot;description\&quot;;s:12:\&quot;50/50 split.\&quot;;s:10:\&quot;stylesheet\&quot;;s:16:\&quot;layout-split.css\&quot;;s:8:\&quot;template\&quot;;s:14:\&quot;layout-sidebar\&quot;;s:7:\&quot;regions\&quot;;a:2:{i:0;s:7:\&quot;content\&quot;;i:1;s:5:\&quot;right\&quot;;}}s:7:\&quot;columns\&quot;;a:5:{s:4:\&quot;name\&quot;;s:7:\&quot;Columns\&quot;;s:11:\&quot;description\&quot;;s:20:\&quot;Three column layout.\&quot;;s:10:\&quot;stylesheet\&quot;;s:18:\&quot;layout-columns.css\&quot;;s:8:\&quot;template\&quot;;s:14:\&quot;layout-columns\&quot;;s:7:\&quot;regions\&quot;;a:4:{i:0;s:6:\&quot;header\&quot;;i:1;s:7:\&quot;content\&quot;;i:2;s:4:\&quot;left\&quot;;i:3;s:5:\&quot;right\&quot;;}}s:6:\&quot;offset\&quot;;a:5:{s:4:\&quot;name\&quot;;s:15:\&quot;Offset sidebars\&quot;;s:11:\&quot;description\&quot;;s:38:\&quot;Main content with two offset sidebars.\&quot;;s:10:\&quot;stylesheet\&quot;;s:17:\&quot;layout-offset.css\&quot;;s:8:\&quot;template\&quot;;s:13:\&quot;layout-offset\&quot;;s:7:\&quot;regions\&quot;;a:4:{i:0;s:6:\&quot;header\&quot;;i:1;s:7:\&quot;content\&quot;;i:2;s:4:\&quot;left\&quot;;i:3;s:5:\&quot;right\&quot;;}}}s:8:\&quot;features\&quot;;a:10:{i:0;s:20:\&quot;comment_user_picture\&quot;;i:1;s:7:\&quot;favicon\&quot;;i:2;s:7:\&quot;mission\&quot;;i:3;s:4:\&quot;logo\&quot;;i:4;s:4:\&quot;name\&quot;;i:5;s:17:\&quot;node_user_picture\&quot;;i:6;s:6:\&quot;search\&quot;;i:7;s:6:\&quot;slogan\&quot;;i:8;s:13:\&quot;primary_links\&quot;;i:9;s:15:\&quot;secondary_links\&quot;;}s:7:\&quot;scripts\&quot;;a:1:{s:9:\&quot;script.js\&quot;;s:37:\&quot;sites/all/themes/rubik/cube/script.js\&quot;;}s:10:\&quot;screenshot\&quot;;s:42:\&quot;sites/all/themes/rubik/cube/screenshot.png\&quot;;s:3:\&quot;php\&quot;;s:5:\&quot;4.3.5\&quot;;}&#039;, &#039;theme&#039;, &#039;sites/all/themes/rubik/cube/;, 0, 0, 0) in /home/members/casinofreephila/sites/ on line 822.

But that's today. We'll have to look further. Strange because I didn't think this particular theme was activated so Drupal shouldn't be looking at it as a theme.

Anyway. We'll keep looking.


comment:5 Changed 5 years ago by

1: Interesting. Certainly we have enemies, but there's no history of anyone launching a DoS attack against us, but it's possible. I know we've been getting spam attempts for awhile, and Jamie actually had me switch the forms from text analysis (via Mollom) to CAPTCHA (via Mollom) because the former was generating a lot of usage even though the submissions were being blocked on Mollom's end. Does a lot of failed CAPTCHAs use up server resources too? Maybe there's some way we could block certain IPs from accessing the site entirely?

2: That is really odd about the theme. I'll look into the issue queue for Rubik and see if there are other reports of that.

comment:6 Changed 5 years ago by

The bottom of the Mollom report at /admin/reports/mollom says:

All servers unreachable or returning errors. The server list was emptied.

So maybe Mollom is down and it's causing more chaos than it normally would?

comment:7 Changed 5 years ago by

OK, one update Mollom-wise. Awhile back we moved from to The former redirects to the latter. Mollom seemed to be processing things fine, but I just noticed that it still had the old URL, so maybe this redirection confused/overwhelmed things? In any case, I changed it to the correct URL.

I'm not sure if problems with Mollom would create server load, but I'm looking for solutions...

comment:8 Changed 5 years ago by

Do you have timeline data or IP source address for the captcha failures? it'd be interesting to try to establish a pattern there.

comment:9 Changed 5 years ago by

They're coming pretty much constantly. Also spambots filling in the search fields. Seems like more than one IP address, just looking at a few I see:

Some of those are on more than one, but they're definitely not all the same.

comment:10 Changed 5 years ago by

About every 20 seconds all day long -- about three to a minute -- and much of it is spam because the error log shows the intended message (which has links to pics of Miley Cyrus or store ads, etc.). This site has been "identified" by the bots! But it doesn't feel like three rejections a minute would cause the kind of i/o overdrive that we're seeing. I mean, the site and server are running fine now, right?


comment:11 Changed 5 years ago by

Hey Ivan – did troubleshooting your views yield anything useful? Or did they all seem fine? The two times I've seen this kind of CPU load on a Drupal site, it's been due to a view that's loading a node that includes broken code that causes the view to hang.

comment:12 Changed 5 years ago by

Web connections to daza just started timing out for me (I noticed it because I was trying to access phpMyAdmin) so I've disabled the site again. Doing some more troubleshooting.

comment:13 Changed 5 years ago by

@jackaponte very interesting! I remember a couple of years ago we had been getting some strange issues when adding new taxonomy tags that suggested some phantom/incomplete nodes. The problem seemed to go away, but maybe they're still hanging out and causing mischief. I have some ways to check for broken nodes now, so I'll take a look.

comment:14 Changed 5 years ago by

Thanks for the tip Jack -- I think that was it.

Using this tip from d.o I identified about 15 nodes that had no uid. I changed the uid on all of them to a valid one. When I flushed the cache and ran cron, it ran quickly and without any errors.

So I can't guarantee that was the *only* problem, but it definitely fixed some bad things.

comment:15 Changed 5 years ago by

  • Priority changed from High to Urgent

Per #3902 (comment #1), I see the site has been taken down again, for about 24 hours now. What's the plan with this? If we need a more robust server then so be it, but we need to figure out a way to have our site back up.

comment:16 Changed 5 years ago by

And also I just want to be clear that if we need to pay more in order to accommodate the load our site is putting on the server, we're happy to do that. We're not trying to overwhelm things for other members, certainly, and I completely understand why the site needed to be taken down. Just want to know what to do next.

comment:17 Changed 5 years ago by

I think we're at the point where we need to start turning off some modules, as this may be the cause of the load spike. Going over your list of enabled modules (" drush sm |grep Enabled |grep -v Core |less" - the "drush sm is pml in newer versions), I've made a short list consisting of the contrib modules that are outside of the wide user base, and thus probably haven't had as many eyes nand most likely to cause problems. In short, modules I was unfamiliar with for the most part. My theory is that if it is a module, its likely not one of the big ones like CCK, Views, Date, Location, Panels etc.

So here's the list:

admin slideshare pathologic spamspan backup_migrate better_formats disqus disqus_migrate equalheights feedburner image_fupload int_meta path_redirect sharethis site_verify stringoverrides submitted_by update_advanced vertical_tabs

The quick way to turn these off from the comman line with drush would be:

drush dis -y admin slideshare pathologic spamspan backup_migrate better_formats disqus disqus_migrate equalheights feedburner image_fupload int_meta path_redirect sharethis site_verify stringoverrides submitted_by update_advanced vertical_tabs

(the -y flad answers all questions as y)

We can turn the site on again and turn these modules off. If the load stays down for at least a day, we can probably assume its the fault of one of those modules, and perhaps turn each back on one by one, and figure out exactly which is which.

comment:18 Changed 5 years ago by

OK, I've taken it back up and disabled all of those modules (but: path_redirect? admin? backup_migrate? You're really not familiar with these? These are like core modules in almost every D6 site I've been a part of).

Let me know what the report is, and if things are good, we can start turning them on one by one. Other than disabling comments (disqus) and sharing (sharethis), this has very little effect on our front-facing site, so as long as we don't have to add much new content, it's not a big deal if this plays out over a week or more.

comment:19 Changed 5 years ago by

  • Priority changed from Urgent to High

Moving this back down to High since the site is back up.

comment:20 Changed 5 years ago by

Just in case this should occur to anyone else: Of the remaining enabled modules, there are four that have updates. None of these seem to involve bugfixes that would address anything we're seeing, so I have NOT installed these updates yet (don't want to do anything that's not necessary on that site at the moment) but if anyone wants to take a look:

Google Analytics:
Emfield Flickr:
Emfield YouTube:

comment:21 Changed 5 years ago by

I have heard of admin and backup_migrate, just never used them. I hadn't heard of path_redirect. Either way, best to case a larger net for these things, in hopes that we don't lock up the CPU on daza.

I'm currently keeping an eye on daza, to see if it starts spiking again.


comment:22 Changed 5 years ago by

Hi folks!

On February 11 I did some request counting when the server was causing high CPU usage: There had been about 2000+ direct hits from single Ukrainian IP address in a twenty minute period, 90 % of those were GET requests to a variety of URLs and 10 % POST requests. At first glance it does not look like someone's primarily trying to pst spam because that would show as a higher percentage of POST requests. About 600+ of those requests were targetting campaign pages. One would assume that those pages are served from Drupal's builtin page cache unless the attacker was using a valid session id of an authenticated user. Even when serving pages from cache Drupal has to partly run its bootstrap process, i.e. PHP code needs to be interpreted and this might be too much for that server still.

I'll dig a little deeper today.


comment:23 Changed 5 years ago by

Generally one should keep the number of features and modules to the necessary minimum and enable only modules that are actually used. This is primarily important for keeping a site manageable. It also has an effect on server-side performance but on cached pages that's negligible. It may of course have an effect on user-perceived performance because sites with many modules enabled tend to serve larger pages. Some modules are only affecting admin pages. This might not be a problem in this case but admin pages which are only served to authenticated users and are not cached therefore can keep a single CPU 100% busy for some seconds.

Let's have a look at the modules mentioned above:

Only affects admin pages.
Is it used?
Works with filters. Filters are cached internally so this can have an impact when there's no cache yet.
See note on pathologic above.
Not at all necessary because MF/PL infrastructure does backups. Aside from yet another module no performance detriment.
This has little impact on performance.
Well. I am not sure yet. If I understand correctly it will enable the Disqus comment web and comments won't be stored in the Drupal database when it is used. This could even be beneficial for performance. I saw those Disqus widgets on some pages, disabling the modules removes the ability to comment on those pages.
I cannot find it on, it's probably part of disqus. Judging by the name its purpose is to migrate Drupal comments to Disqus comments. So I believe it can be safely disabled but that won't give us a performance boost.
Has not much impact on server-side performance - only for adding some extra markup.
Sounds like feeds are redirected to Google. As the actual feed is served by an external mechanism this should help with performance. Does someone want to verify this?
An enhancement (or if you don't like Flash a degradation) for image_field uploads. Only contributing to node edit forms.
This will definitely have an impact on non-cached pages.
Redirects should be avoided. In addition to that Redirects can also be set in your Web configuration to avoid the overhead of bootstrapping Drupal just to produce a redirect. This is a somewhat complex topic though.
This will have an effect on non-cached pages, too.
No idea how big the performance impact is.
If it's just a few strings that are replaced it's no problem. If it's not used at all it should go.
No idea how big the performance impact is.
Affects certain admin pages.
A UI enhancement only affecting admin pages.

I'll do some profiling of the most visited pages later today. Anyhow it seems the solution here is to improve on serving cached pages by using Varnish or Boost, because apparently the site cannot even handle an increased number of anonymous visitors.

-- stefan

comment:24 Changed 5 years ago by

Just adding my two cents: admin, path_redirect, and backup_migrate are indeed very commonly used modules. I use them on just about all my sites, many on MFPL, and haven't seen them cause any significant problems.

In response to Stefan's comments: backup_migrate allows for a customized backup schedule that can include more frequent backups of the Drupal database than MFPL's schedule, which is important for me in terms of protecting clients in the event of them accidentally deleting content, messing up their site, etc. It also provides backups that are far more easily accessible to MFPL members than MFPL's official backups.

path_redirect: Why should redirects be avoided in general? They're very handy in many cases: redirecting visits to pages on an old site to the appropriate aliases on a new Drupal site, dealing with changes to existing node aliases, etc. While redirects can be set in the web configuration, this is not user-friendly at all to non-techies; providing path_redirect allows organizations to maintain their redirects themselves without asking/paying an outside techie to do it every time.

Anyhow, from what Stefan's looked into it seems to me like the problem is more those massive hits coming into the site, though of course massive hits to a site with tons of modules enabled will cause more problems than massive hits to a site with few or no modules enabled.

comment:25 Changed 5 years ago by

Thanks Stefan for all the research - your findings about the role of a single IP address are really helpful. Reducing modules is always a good idea, although I tend to agree with Jack about dealing with the massive hits.

We should be building sites that can with stand such an onslaught of GET requests... but knowing that it's a single IP address at least provides us an opportunity to ban the IP address if it gets hit again in a similar fashion as a short term, emergency method that is less drastic than shutting down the entire site (see mf-ban-ip-address for an easy way to ban a single ip address).


comment:26 Changed 5 years ago by

On avoiding redirects: I have used the verb should to indicate that ignoring this advice can have detrimental consequences. There are valid reasons and circumstances to ignore this advice mentioned in jackaponte's comment.

Why should they be avoided? YSlow recommends not to use redirects, because user-perceived performance suffers from it: Redirecting with path_redirect module has an additional performance impact on the server, because Drupal has to fully bootstrap before the module can do its work, which is not more than redirecting to the target page. When serving the target page there will be another full or partial bootstrap depending on whether a page is delivered from cache or not.

I have noticed that in my copy of the site "minimum cache lifetime" is not set. This can open a door for an attacker depending on the overall setup. Whenever a comment is submitted successfully cache_clear_all is called flushing all caches immediately if no minimum lifetime is set. There are some other modules that can flush all caches as well. Setting a minimum lifetime and should be given a try.

Given the nature of this problem improvements concerning block level caching might not really help. But I have also noticed that views blocks are not cached at all and profiling reveals that a lot of time is spent on rendering views blocks when the whole page is not served from cache. This can be enabled in the Views UI.

All admin pages are slow. Convenience is more important than resource usage here I guess. Admin modules displaying large menu trees as drop downs can become a burden especially when there are many modules adding configuration pages.

-- stefan

comment:27 Changed 5 years ago by

Thanks for the additional follow up Stefan.

I tend to agree with Jack about the main problem in this case being the huge number of hits - which as you point out comes from a single IP address. Following on that... I found two IP addresses hitting the site significantly more than anyone else since last sunday morning:

0 daza:~# mf-analyze-web-log-hits-by-ip /home/members/casinofreephila/sites/ | tail -n 3
0 daza:~#

The third in the list is google.

The heaviest IP address, as Stefan points out, comes from the Ukraine, with number two coming from Costa Rica. Both are still hitting the site today (over 18,000 hits from today, with the most recent ones being posts to volunteer page).

I've just taken the liberty of banning both IP addresses via iptables. It's a temporary fix that will be undone the next time the server is re-booted (this sunday), however, I'm hoping that the bot hitting the site will realize that it's timing out and move on by then.


comment:28 Changed 5 years ago by

Big thanks to everyone who's been working on this. I wanted to check and see if blocking those IPs had seemed to address the issue.

If so, or if maybe but we're not sure, I'd like to re-enable the following modules which either a) only affect admin users, or b) have no impact on performance:

  • admin
  • pathologic
  • spamspan
  • backup_migrate
  • better_formats
  • equalheights
  • stringoverrides
  • vertical_tabs
  • update_advanced

Soon, I'd like to enable the following third-party-integrating modules:

  • disqus
  • feedburner

If things seem OK, I could enable those with the others I listed above; if we want to do this in stages I could wait a couple of days before re-enabling these two. But we're going to be posting new content this week, and I'd like to have these two back on by then if possible.

For reference, this will leave the following modules un-enabled:

  • slideshare: I don't think this is being used anymore.
  • disqus_migrate: All the comments were migrated, so we don't have to re-enable this.
  • sharethis: We're replacing this with some code on the node templates themselves, so we won't need this module anymore.
  • image_fupload: I'd like to re-enable this eventually, but we can have staff work around it for now.
  • int_meta: Would really like this re-enabled eventually for SEO reasons, but again, we can live without it for awhile.
  • site_verify: This is for submitting sitemaps to search engines. I have to check on whether this is necessary once they've been verified, as they have.
  • submitted_by: This is a lazy way to override how the "posted on X by Y" text is displayed on nodes. We can live without it for awhile.

comment:29 Changed 5 years ago by

Hi Ivan,

Since we've restarted the server (yesterday afternoon) the IPs are no longer blocked. You can check your web log to see if they are back or not.

I think limiting the number of processes allowed to run to just 3 is a hard brake on the site going out of control again. So, I think re-enabling those modules is safe.

The real test will happen when the limit of 3 processes starts to interfere with legit traffic to the site.


comment:30 Changed 5 years ago by

It looks like limiting the number of processes to 3 for had the opposite effect. Due to a bug, that caused to have no limit.

I just removed the line and reduced the overall maxprocessperclass number on the server to 10.

comment:31 Changed 5 years ago by

The site was hit hard again yesterday between 5:00 pm and 6:00 pm America/New_York.

Here's an analsysis of the IP addresses that have the most number of hits during this period:

0 daza:~# /usr/local/sbin/mf-analyze-web-log-hits-by-ip /home/members/casinofreephila/sites/ "27/Feb/2011:17"| tail -n 15 | while read hits ip; do printf "%s hits from %s: " "$hits" "$ip"; whois "$ip" | egrep "OrgName|country"; done
14 hits from OrgName:        SoftLayer Technologies Inc.
17 hits from country:        RU
19 hits from country:      CN
country:      CN
21 hits from country:        NL
31 hits from OrgName:        Google Inc.
33 hits from country:        UA
36 hits from country:        LV
36 hits from country:        LV
36 hits from country:        LV
36 hits from country:        LV
36 hits from country:        NL
44 hits from OrgName:        Yahoo! Inc.
71 hits from country:        NL
102 hits from country:        NL
1073 hits from country:        UA
0 daza:~#

UA is the Ukraine. These findings are consistent with Stefan's earlier analysis.

Here's number of hits distributed by minute since Sunday morning:

0 daza:~# mf-analyze-web-log-hits-by-time /home/members/casinofreephila/sites/ | tail -n 10
     86 27/Feb/2011:23:22
     99 27/Feb/2011:11:20
    100 28/Feb/2011:00:36
    116 27/Feb/2011:16:55
    139 27/Feb/2011:17:20
    144 27/Feb/2011:17:22
    160 27/Feb/2011:17:24
    165 27/Feb/2011:17:21
    165 27/Feb/2011:17:23
    409 27/Feb/2011:17:19
0 daza:~#

This fairly clearly shows that starting a 17:20, it got slammed relative to it's normal traffic.

And lastly, this shows what resources were being requested between 5:00 and 6:00 pm:

0 daza:~# mf-analyze-web-log-requests /home/members/casinofreephila/sites/ "27/Feb/2011:17" | tail -n 10
     47 GET /campaigns/2009/declaration-independence-casinos
     47 GET /campaigns/2009/red-bankrupting-casinos-before-they-bankrupt-us
     48 GET /campaigns/2010/reclaim-riverfront
     49 GET /campaigns
     52 GET /blog
     52 GET /casino-facts
     55 GET /
     57 GET /campaigns/2007/phillys-ballot-box
     62 POST /contact
    101 POST /volunteer
0 daza:~#

The IP with the highest number of hits was posting like mad to the volunteer and contact pages. For anonymous users, those pages are captcha protected. Are they captcha protected for users with accounts? If not (and you have an open user account creation policy) I would suggest adding captcha for logged users. Ivan: can you check the Drupal logs to see if anyone has logged in from that IP address and if so revoke that account?

I think minimally, taking those forms offline will probably be the single greatest contribution that could be made to avoiding this problem in the future. Given the nature of the Internet, I suspect this is a blind spam attack, not a targeted attack. I also seriously wonder if the word casino in the URL and in the content of the pages has made this site a target. Not a political target. I still can't quite figure out why, but I wonder if these spam bots for some reason or another think that casino sites might be more vulnerable?


comment:32 Changed 3 years ago by

  • Keywords spam added
  • Resolution set to fixed
  • Status changed from new to closed

Well it's been a while since this issue, so I think it's safe to close.

comment:33 Changed 3 years ago by

  • Resolution fixed deleted
  • Status changed from closed to assigned

Hate to do this, but it looks like the volunteer page is still wide open and having POST requests made on a regular basis.

Here's a list of the top three IP addresses, and what should be a top three pages visited list:

0 daza:~# logfile='/home/members/casinofreephila/sites/'; for activeip in $(awk '{print $1}' $logfile |sort | uniq -c | sort -n | tail -3 | awk '{print $2}');do count=$(grep -c "$activeip" "$logfile"); printf "\nIP: $activeip vistited $count times."; for activepage in $(grep "$activeip" "$logfile" | awk '{print $7}' | sort |uniq -c |sort -n |tail -3 |awk '{print $2}');do printf "\nThis page: $activepage"; c=$(grep "$activeip" "$logfile"|grep "$activepage" |grep -c POST); printf " had $c POST requests, and $(grep "$activeip" "$logfile"|grep "$activepage" |grep -c GET) GET requests."; done; printf "\n"; done

IP: vistited 1112 times.
This page: /volunteer had 1005 POST requests, and 107 GET requests.

IP: vistited 1187 times.
This page: /volunteer had 1036 POST requests, and 151 GET requests.

IP: vistited 1653 times.
This page: /volunteer had 1484 POST requests, and 169 GET requests.
0 daza:~# 

It looks like the most common IPs to visit only go to the volunteer page. In fact, of the top five requests in the log, it volunteer is way overrepresented:

0 daza:~# logfile='/home/members/casinofreephila/sites/';cat $logfile | awk '{print $7}' | sort | uniq -c |sort -n |tail -5
    676 /
    780 /robots.txt
    842 /rss.xml
    995 /favicon.ico
   5915 /volunteer
0 daza:~# 

More popular than the homepage, the robots file, the rss feed, and the favicon sure looks like spam behavior to me. So while there isn't an huge issue right now, there might be something that needs further, or at least a bit more investigation.

I'm going to reopen this for now. :\


comment:34 Changed 3 years ago by

Honestly, Casino-Free isn't active enough among volunteers to justify the headache of this form. I've disabled the form itself and redirected the URLs to another page.

I'm still curious why the site is attracting so much attention -- maybe we're just cursed with the attractive looking domain name...

comment:35 Changed 3 years ago by

  • Resolution set to fixed
  • Status changed from assigned to closed

Thank you! While this wasn't currently overwhelming the server, it is always good to effectively deal with stuff that spammers seem to be using.

I think part of it might be the allure of a good name, I wouldn't be surprised if "casino" is a spam keyword. For all I know, so is "philly."

Anyway, thanks for the speedy reaction to my follow up. I think we can close this ticket, hopefully for good.


Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.