Opened 3 weeks ago

Closed 10 days ago

#15964 closed Bug/Something is broken (fixed)

colin.mayfirst.org is unreachable

Reported by: Joseph Owned by: Joseph
Priority: Medium Component: Tech
Keywords: Cc: ben@…, Jamie McClelland, takethestreets
Sensitive: no

Description

Hey y'all,

Just reporting that HCN's server seems to be down. I can't access it via SSH, so not sure what's going on.

Joseph

Change History (27)

comment:1 Changed 3 weeks ago by JaimeV

Owner: set to JaimeV
Status: newassigned

Yeah it looks like there are a ton of processes being invoked under the healthcarenow user like php /usr/local/bin/wp --user=civicrmcron" ... that are crashing the system.

comment:2 Changed 3 weeks ago by JaimeV

Cc: Jamie McClelland added

It looks like these have been setup as cron jobs. We'll need to figure out how to keep these from overwhelming the system or what other conditions are pushing this colin over the edge. Copying jamie here as well.

comment:3 Changed 3 weeks ago by Joseph

Thanks, Jaime.

HCN runs a multi-site WP install, each with a Civi instance. These cronjobs are mostly just handling the maintenance Civi requires, like cache clean up, system update checking, sending mailings, etc. These cronjobs have been in place for a few years and never caused the system to get overwhelmed like this. My guess is that something else has changed.

I ran an upgrade yesterday (CiviCRM 5.28.4), but that was fairly minimal. Ben, have y'all installed anything or made any other changes in the last day? Maybe this morning?

comment:4 Changed 3 weeks ago by Joseph

I've turned all the cronjobs off for now to see if the server catches up.

comment:5 Changed 3 weeks ago by Joseph

I've found a ton of very suspicious files. I'm cleaning them up now.

comment:6 Changed 3 weeks ago by takethestreets

So I was able to get the site back up by switching from PHP 7.4 to 7.3. Jaime - was there a bulk switch of PHP versions? I also see that the CLI PHP version is 7.4; could that be changed back to 7.3?

comment:7 Changed 3 weeks ago by Joseph

Cc: takethestreets added

Indeed. It looks the site was hacked. I've used git to remove as many of these files as possible. It looks there were some recent plugin upgrades. I'm going to revert those and rerun them to make sure those don't contain any suspicious files.

comment:8 Changed 3 weeks ago by Joseph

Owner: changed from JaimeV to Joseph
Priority: UrgentMedium

I reran the plugins updates, so those should all be fine now.

I'm not sure the ultimate source of the compromise, but there were lots of stray php files, mostly index.php or $PREFIX_index.php. git found most of these, but I searched across the whole webroot to find any in places git isn't tracking. I found a few more.

I'm reenabling cron. I'll just watch everything to make sure we're good here. Again though I'm not sure what the source was.

comment:9 Changed 3 weeks ago by Jamie McClelland

colin went down again around 4:45 am this morning - the console reported:

[5203458.615736] Killed process 2530 (php-fpm7.3) total-vm:422412kB, anon-rss:11280kB, file-rss:0kB, shmem-rss:13348kB
[5203602.339468] Out of memory: Kill process 2418 (php-fpm7.3) score 4 or sacrifice child
[5203602.341011] Killed process 2418 (php-fpm7.3) total-vm:422412kB, anon-rss:11308kB, file-rss:0kB, shmem-rss:13472kB
[5203796.958131] Out of memory: Kill process 2588 (php-fpm7.3) score 4 or sacrifice child
[5203796.960485] Killed process 2588 (php-fpm7.3) total-vm:422416kB, anon-rss:11312kB, file-rss:0kB, shmem-rss:13608kB
[5203902.277823] Out of memory: Kill process 21370 (php-fpm7.3) score 4 or sacrifice child
[5203902.279187] Killed process 21370 (php-fpm7.3) total-vm:421988kB, anon-rss:9712kB, file-rss:4kB, shmem-rss:14824kB
[5203902.609953] Out of memory: Kill process 28134 (php-fpm7.3) score 4 or sacrifice child
[5203902.611644] Killed process 28134 (php-fpm7.3) total-vm:421988kB, anon-rss:9712kB, file-rss:0kB, shmem-rss:14824kB
[5203902.815154] Out of memory: Kill process 29006 (php-fpm7.3) score 4 or sacrifice child
[5203902.816730] Killed process 29006 (php-fpm7.3) total-vm:421988kB, anon-rss:9712kB, file-rss:0kB, shmem-rss:14824kB
[5203905.915716] Out of memory: Kill process 30388 (php-fpm7.3) score 4 or sacrifice child
[5203905.917079] Killed process 30388 (php-fpm7.3) total-vm:421988kB, anon-rss:9712kB, file-rss:0kB, shmem-rss:14824kB
[5203906.458214] Out of memory: Kill process 22810 (php-fpm7.3) score 4 or sacrifice child
[5203906.459900] Killed process 22810 (php-fpm7.3) total-vm:421988kB, anon-rss:9708kB, file-rss:0kB, shmem-rss:14824kB

Just for good measure I rebooted with an extra 2GB of RAM, although it may be related to the hack.

comment:10 Changed 3 weeks ago by Joseph

It is. Files are back. I'll clean them up.

comment:11 Changed 3 weeks ago by Joseph

Most of the files cleaned up. Jamie, did you rename any files in the webroot with the suffix, .suspected?

comment:12 Changed 3 weeks ago by Joseph

Priority: MediumHigh

Everything is back to normal for now. I'll to find the root compromise, so this doesn't happen again.

The web.log file is 1.4G can we rotate that out?

comment:13 Changed 3 weeks ago by Healthcare-NOW

I checked with Stephanie, and we haven't done anything or installed anything recently that we can think of that could be the culprit here. Stephanie's husband has been updating the moodle subdomain site (singlepayerschool.healthcare-now.org), and tinkering with the settings there to prevent mail from going out through mail.mayfirst.org. Last week I also installed a new WordPress plugin ("WP Mail SMTP") that would allow all WordPress emails to be routed through the bulk mail server as well. Stephanie says she was working on some pages/posts yesterday. But that's really it.

I'll let you know if we think of anything else, but very unlikely!

Ben

comment:14 Changed 3 weeks ago by Jamie McClelland

Thanks Joseph and Ben for digging into the root cause.

I haven't renamed any files, though Jaime may have. I just rotated the web log.

Keep us posted on what you find.

comment:15 Changed 3 weeks ago by Jamie McClelland

Hi all - not sure if this is related or not - but I just killed about 150 cron jobs on malcolm running for the dev.healthcare-now.org and also disabled the cron job that was launching them.

comment:16 Changed 3 weeks ago by Jamie McClelland

I rebooted colin again - out of memory, the kernel was killing off PHP processes. I also lowered the max allowed PHP processes from 24 to 12 so that if things go nuts again, it doesn't consume all the memory on the host.

If this is not compromised related and your host really does need more RAM we can allocate more RAM, but I'm assuming this is compromise-related and as soon as the compromise is under control we won't be running out of RAM.

Joseph - can you confirm whether the compromise is continuing? Also, let us know if you need help routing it out. If you haven't already, I would suggest changing the healthcarenow unix password, database password, and all the user passwords on the site.

comment:17 Changed 3 weeks ago by Jamie McClelland

It is definitely compromised. I just disabled the web site to protect your data and also because the site is not easily accessible during this compromise.

comment:18 Changed 3 weeks ago by Healthcare-NOW

FYI I just saw this news piece on the WordPress plugin "File Manager," which we have been using, causing a global hacking spree: https://www.zdnet.com/article/millions-of-wordpress-sites-are-being-probed-attacked-with-recent-plugin-bug/

Could this be the source of our problems? I went to the site to uninstall the plugin, but we appear to be down.

Ben

comment:19 Changed 3 weeks ago by Jamie McClelland

Nice research, looks like a likely candidate. I just cleared out the obvious compromised php files and deactivated that plugin and then I re-enabled the site (I had disabled the site because it was obviously compromised and I wanted to lesson the damage that the compromise could be causing).

I also made a few changes that might cause some problems:

  • I changed the database user password
  • I changed the healthcarenow password - if you use that password to login to the support site or the control panel - you may need to reset it here: https://members.mayfirst.org/resetpass - be sure to use a new password for it.

I would suggest you reset all the word press login passwords that are active on the site as an additional precaution.

comment:20 Changed 2 weeks ago by takethestreets

This is some correlating evidence for "File Manager" being the culprit: https://www.zdnet.com/article/millions-of-wordpress-sites-are-being-probed-attacked-with-recent-plugin-bug/

comment:21 Changed 2 weeks ago by Joseph

Unfortunately the attackers still have access. Everything is back to being compromised. I can start to rebuild with the core and plugins to make sure the compromise isn't include in one any of those places, but there is a lot hosted under this domain including a bunch of non-WP material. I'll check the logs again to see if I missed anything last time too.

comment:22 Changed 2 weeks ago by Joseph

I've cleaned up git again. It looks like there were upgrades run, and maybe Wordfence is flagging some files. Right now git is clean. I cross-checked all the changes against the latest version of WP (5.5.1), and they seem legitimate. I did find a compromise in wp-includes/embed.php.

As a point of process, if anyone runs an upgrade or changes anything in the file system, can you post it here? It'll make distinguishing between what has legitimately changed from what hasn't a little easier.

I also searched for any *index.php files that might be in places that aren't tracked by git. That seems to be the main pattern for introducing new files. I found some in .git/, docs/, episodes/, videos/ and few other places. I removed all those.

I analyzed the web.log and checked all the files that had POST's made to them. There were a couple of randomly named php files in some untracked folders I removed as well.

I've checked the code base for any eval(), base64_decode() and gzuncompress() functions, but there are hundreds of seemingly legitimate uses, so that's not too helpful.

I did noticed that all the compromising POST requests were made to campaignforguaranteedhealthcare.org. Since that campaign has ended, I've removed it from the Virtual Host file, so it no longer resolves to the WP site.

Last edited 2 weeks ago by Joseph (previous) (diff)

comment:23 Changed 2 weeks ago by Joseph

Priority: HighMedium

Everything seems in order this morning, so that's good news. I'll continue to monitor everything.

comment:24 Changed 2 weeks ago by Jamie McClelland

Great to hear - thanks Joseph.

comment:25 Changed 2 weeks ago by Joseph

Still looking good this morning. I'll keep this open for another week before closing it.

comment:26 Changed 2 weeks ago by Healthcare-NOW

Joseph, you wanted any file changes/updates posted here:

  • Uploaded episode18.mp3 file to our "episodes" folder for the latest podcast
  • About to run updates on 3 plugins: Charitable - Videos, Metaslider, and WooCommerce
  • Also updating the Beaver Builder Theme

Thanks, Ben

comment:27 Changed 10 days ago by Joseph

Resolution: fixed
Status: assignedclosed

Everything still seems good, so I think we can close this. Thanks everyone for all the work to this sorted out.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.