Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#10106 closed Bug/Something is broken (fixed)

site down. databases unknown, can't create new database that is known

Reported by: Jill Hubley Owned by: Ross
Priority: Medium Component: Tech
Keywords: drupal, mysql, kahlo.mayfirst.org Cc: jill@…, anila@…, colette@…
Sensitive: no

Description

Hi,

I'm installing a new version of the nacla site (drupal 7) on the kahlo.mayfirst.org server. However I've run into major problems creating/editing databases and database users.

I created a database (nacladrupal7) and user (nacladrupal7) through the control panel. I uploaded my database to the server (it's in the /sites folder). I ssh'd in to mysql the db, but I'm getting the following error: ERROR 1049 (42000): Unknown database 'nacladrupal7'

It also appears that the previously existing Drupal 6 database is now inaccessible or corrupted.

Can you please help me set up a new database asap? I'm really not sure where the error lies.

Thank you very much, Jill

Change History (48)

comment:1 Changed 5 years ago by Jill Hubley

Cc: jill@… added

comment:2 Changed 5 years ago by Dana

Cc: jamie@… added
Keywords: kahlo.mayfirst.org added
Owner: set to Ross
Status: newassigned

Hi Jill,

Looping Ross & Jamie in to help.

Dana

comment:3 Changed 5 years ago by Ross

Cc: jamie@… removed

Hi Jill,

It appears that mysql was over-loaded with open files. I don't know what was going on to create this problem, but I restarted mysql and re-created the database nacladrupal7. You should be good to go with the database now. I will note that you do not have a .htaccess file in your web root, so the site will probably never load any pages other than the homepage.

All the best,

~/ross

comment:4 Changed 5 years ago by Jill Hubley

Resolution: fixed
Status: assignedclosed

Hi Ross,

Excellent, thanks so much for your help! I purposely left .htaccess out for the time being so that I could easily display an html file with an 'undergoing maintenance' message while the problem was resolved. Uploading htaccess now, and all set.

Thanks again! Jill

comment:5 Changed 5 years ago by Jill Hubley

Resolution: fixed
Status: closedassigned

Hey again Ross,

Well I spoke too soon.

I had the site up and running for about 10 minutes, but now I'm getting a new error:

PDOException: SQLSTATE[HY000] [2002] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in lock_may_be_available() (line 167 of /home/members/nacla/sites/web.nacla.org/web/includes/lock.inc).

Can you take a look? FWIW the same database is running on malcolm.mayfirst.org (dev.nacla.org) and I haven't encountered these issues there.

Thanks! Jill

comment:6 Changed 5 years ago by Jill Hubley

Priority: UrgentMedium

Ah-- I'm very sorry to inundate you here. It is working again. I'll look into this, but if there are any adjustments you recommend (in php.ini maybe) it would be greatly appreciated.

comment:7 Changed 5 years ago by Ross

Resolution: fixed
Status: assignedfeedback

Hi Jill,

I don't think this is a php.ini issue. I've increased the open file limits on the server, which I suspect is the problem.

Let me know if this resolves the issue,

~/ross

comment:8 Changed 5 years ago by Jill Hubley

Resolution: fixed
Status: feedbackassigned

Hi Ross,

I'm still getting some errors, all of which look like they center around cache.inc Here they are:

PDOException: SQLSTATE[HY000]: General error: 2013 Lost connection to MySQL server during query: TRUNCATE {cache_rules} ; Array ( ) in cache_clear_all() (line 165 of /home/members/nacla/sites/web.nacla.org/web/includes/cache.inc).

PDOException: SQLSTATE[42S02]: Base table or view not found: 1146 Table 'nacladrupal7.cache_views_data' doesn't exist: TRUNCATE {cache_views_data} ; Array ( ) in cache_clear_all() (line 165 of /home/members/nacla/sites/web.nacla.org/web/includes/cache.inc).

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: DELETE FROM {cache_bootstrap} WHERE (cid = :db_condition_placeholder_0) ; Array ( [:db_condition_placeholder_0] => variables ) in cache_clear_all() (line 165 of /home/members/nacla/sites/web.nacla.org/web/includes/cache.inc).

Any thoughts?

Thank you so much, Jill

comment:9 Changed 5 years ago by Ross

Hi Jill,

These are all different errors and disturbing. The second error is telling you that the cache_views_data table is missing from the database. This is obviously not a good thing and will require you manually rebuilding the table with the correct schema. However, according to the mysql command:

mysql> show tables like "%cache%";
+----------------------------------+
| Tables_in_nacladrupal7 (%cache%) |
+----------------------------------+
| cache                            |
| cache_admin_menu                 |
| cache_block                      |
| cache_bootstrap                  |
| cache_commerce_shipping_rates    |
| cache_field                      |
| cache_filter                     |
| cache_form                       |
| cache_image                      |
| cache_libraries                  |
| cache_menu                       |
| cache_page                       |
| cache_path                       |
| cache_rules                      |
| cache_token                      |
| cache_update                     |
| cache_views                      |
| cache_views_data                 |
| ctools_css_cache                 |
| ctools_object_cache              |
+----------------------------------+
20 rows in set (0.00 sec)

mysql> 

cache_views_data does exist, but it cannot be queried. It seems to me like this may have been a problem with the database import or exported sql. I think I'd try re-creating the database again from scratch and then dump it again and re-import the newest dump. Something is clearly wrong with the database itself.

I was able to delete the ephemeral table, but not able to recreate it, which I tried with:

drush -d -u 1 scr dev/re-create-cache_views_data.php

from web root. I think the best step now is to try to rebuild this corrupted database. At this point I see no other options.

~/ross

comment:10 Changed 5 years ago by Jill Hubley

Hi Ross,

Got it. I'm re-exported and am trying to re-import now, but am hitting some errors doing that.

Before importing, I tried to just drop the database entirely (was then going to create the database again), but got this error: mysql> drop database nacladrupal7; ERROR 1010 (HY000): Error dropping database (can't rmdir './nacladrupal7', errno: 39)

Since all of the tables were dropped, I went ahead with the import, but then it wasn't able to create a table:

0 naclaweb@kahlo:~/web.nacla.org/web/sites$ mysql -u nacladrupal7 -p nacladrupal7 < nacladevd627.sql Enter password: ERROR 1005 (HY000) at line 9066: Can't create table 'nacladrupal7.cache_views_data' (errno: -1) 1 naclaweb@kahlo:~/web.nacla.org/web/sites$

Is this a permissions issue?

Thanks for all your work, Jill

comment:11 Changed 5 years ago by Jill Hubley

Okay - I just created a new database (naclasited7) through the Mayfirst control panel, reimported to it, and so far I'm not hitting any errors. So hopefully that will continue to be the case!

Thanks a million, Jill

comment:12 Changed 5 years ago by Jill Hubley

Hi Ross,

Okay, well, I'm not hitting those same errors anymore, but some others are coming up. If you're able to take a look/advise me, I'd greatly appreciate it. I'm not entirely sure what sort of support contract (if any) NACLA has with you.

Here's what I'm getting:

PDOException: SQLSTATE[HY000] [2002] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in lock_may_be_available() (line 167 of /home/members/nacla/sites/web.nacla.org/web/includes/lock.inc).

and these two, which show up on the same page:

PDOException: SQLSTATE[HY000]: General error: 2013 Lost connection to MySQL server during query: TRUNCATE {cache_rules} ; Array ( ) in cache_clear_all() (line 165 of /home/members/nacla/sites/web.nacla.org/web/includes/cache.inc).

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: DELETE FROM {semaphore} WHERE (value = :db_condition_placeholder_0) ; Array ( [:db_condition_placeholder_0] =&gt; 1674513993543ee18d5fb8a7.67878449 ) in lock_release_all() (line 269 of /home/members/nacla/sites/web.nacla.org/web/includes/lock.inc).

Thank you, Jill

comment:13 Changed 5 years ago by Jill Hubley

Hi again Ross,

I did some further investigation, and found this issue on drupal.org, which seems to be the same problem the nacla site is having-- mysql crashing due to not enough resources: https://www.drupal.org/node/1665348

Would it be possible to create a swap file as suggested in that issue thread? Or provide more disk space/memory for mysql?

I'll wait to hear from you.

Thanks! Jill

comment:14 Changed 5 years ago by Ross

Hi Jill,

It does seem like this problem is related to too few resources and a very bloated site. NACLA is on it's own virtual server, which has been kindly gifted by May First. Normally members pay additional dues, usually $50/month. We could add resources, but I believe it would be fair that NACLA increase it's dues to fit with the resources it's already been using for over a year now.

Does this seem reasonable and/or possible?

~/ross

comment:15 Changed 5 years ago by Jill Hubley

Hi Ross,

Got it. That seems more than fair to me. Since I'm not an employee of NACLA (only a contractor) I don't have any say, but I've passed this along to Anila and stressed that I think it's crucial. I hope to get back to you with their answer soon.

Thanks, Jill

comment:16 Changed 5 years ago by Jill Hubley

Hi Ross,

Anila just emailed me and has agreed to the $50/month dues. Could you or someone on your team send them an invoice?

Since they have agreed, could you go in and look at increasing resources for the site?

What I find a little puzzling is that we weren't getting these errors on the malcolm server (where dev.nacla.org) is, but are now getting them on the kahlo server. I'm sure there is a reason, just a difference I wanted to point out in case you see a difference in those server environments.

Also, the nacla Drupal 7 database is now only ~200MB, whereas the Drupal 6 database was over 600MB. Again, I don't know if this matters, or if it's helpful to know.

Could you please let me know if you're able to work on this, and if you need anything from me?

Thank you so much! Jill

comment:17 Changed 5 years ago by Ross

Hi Jill,

I have to admit, that I'm surprised that the site on kahlo is not responding better. However, the old site required 2Gig of RAM in order to work on that server, so maybe it's not to surprising that a drupal 7 version needs more resources. Anyway, I have added one cpu and another gig of RAM.

I also set the mysql innodb_buffer_pool_size variable to 750M. This means that mysql will store up to 750M of queries in RAM. I believe this may help mysql work more efficiently.

Please let me know how this works out for you,

~/ross

comment:18 Changed 5 years ago by Jill Hubley

Hi Ross,

Thanks, it does seem to be running more smoothly now.

The only time I now seem to be getting an error is when I clear the site cache.

Here's the error (same one that was coming up before):

Additional uncaught exception thrown while handling exception.

Original

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: SELECT filename FROM {registry} WHERE name = :name AND type = :type; Array ( [:name] =&gt; views_handler_sort_date [:type] =&gt; interface ) in _registry_check_code() (line 3157 of /home/members/nacla/sites/web.nacla.org/web/includes/bootstrap.inc).

Additional

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: INSERT INTO {watchdog} (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9); Array ( [:db_insert_placeholder_0] =&gt; 1 [:db_insert_placeholder_1] =&gt; php [:db_insert_placeholder_2] =&gt; %type: !message in %function (line %line of %file). [:db_insert_placeholder_3] =&gt; a:6:{s:5:&quot;%type&quot;;s:12:&quot;PDOException&quot;;s:8:&quot;!message&quot;;s:214:&quot;SQLSTATE[HY000]: General error: 2006 MySQL server has gone away: SELECT filename FROM {registry} WHERE name = :name AND type = :type; Array ( [:name] =&amp;gt; views_handler_sort_date [:type] =&amp;gt; interface ) &quot;;s:9:&quot;%function&quot;;s:22:&quot;_registry_check_code()&quot;;s:5:&quot;%file&quot;;s:66:&quot;/home/members/nacla/sites/web.nacla.org/web/includes/bootstrap.inc&quot;;s:5:&quot;%line&quot;;i:3157;s:14:&quot;severity_level&quot;;i:3;} [:db_insert_placeholder_4] =&gt; 3 [:db_insert_placeholder_5] =&gt; [:db_insert_placeholder_6] =&gt; https://nacla.org/home [:db_insert_placeholder_7] =&gt; https://nacla.org/home [:db_insert_placeholder_8] =&gt; 24.193.119.150 [:db_insert_placeholder_9] =&gt; 1413558609 ) in dblog_watchdog() (line 160 of /home/members/nacla/sites/web.nacla.org/web/modules/dblog/dblog.module).

Thank you! Jill

comment:20 Changed 5 years ago by Jill Hubley

Priority: MediumHigh

Hi Ross- Are you doing anything right now, because now this error is the only thing I get when I load the site, from any page.

PDOException: SQLSTATE[HY000] [2002] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in lock_may_be_available() (line 167 of /home/members/nacla/sites/web.nacla.org/web/includes/lock.inc).

comment:21 Changed 5 years ago by Jill Hubley

I also logged into the mayfirst control panel, tried to delete one of the databases that's not in use (nacla_drupal7) and received a hard-error.

Thanks, Jill

comment:22 Changed 5 years ago by Jill Hubley

I've put a maintenance page up until you have time to look at this. Thanks very much, Ross.

comment:23 Changed 5 years ago by Jill Hubley

Priority: HighUrgent

Hi Ross,

Just upping the priority of this ticket.

I'll wait to hear from you now.

Thanks as always, Jill

comment:24 Changed 5 years ago by Ross

Hi Jill,

It's a similar problem. Sorry it took so long for me to respond. I'm on a retreat for the weekend. I've made more modifications to the mysql server. Hopefully this change will help. I'm not sure what else I can do at this point. There's no reason I can tell for the site to be eating this many resources unless something in the code is corrupted.

Let's see if this newer configuration helps. If not were going to need to dig into the code and try to figure out what's causing the memory leak.

~/ross

P.S. I did go ahead and complete the database deletion.

comment:25 Changed 5 years ago by Jill Hubley

Priority: UrgentMedium

Hi Ross,

Thank you so, so much for taking care of this despite being on a retreat. The site is back up now. I'll keep an eye on it and hopefully this latest config will work.

Best, Jill

comment:26 Changed 4 years ago by Jill Hubley

Resolution: fixed
Status: assignedclosed

Hey again Ross,

I haven't run into that error again and the site seems stable, so I'm marking this ticket fixed. Thanks very much for your help with this!

Jill

comment:27 Changed 4 years ago by Jill Hubley

Priority: MediumHigh
Resolution: fixed
Status: closedassigned

Hi Ross,

I spoke too soon. The site's down again.

I got the error again right after I used drush to flush all caches.

Last week, I was also getting the error right after cache flush. However, I did clear the cache from within the Drupal admin earlier tonight without a problem, which is why I thought things were ok.

Thanks, Jill

comment:28 Changed 4 years ago by Jill Hubley

Priority: HighMedium

Hi again Ross,

Well the site's back up, a minute later, so that's great.

I'm going to leave the ticket open, though, in case you have any thoughts on why I'm getting that error after clearing the cache (first I get the one in comment #18, then after refreshing the page I get the error in comment #20).

Thank you! Jill

comment:29 Changed 4 years ago by Ross

Hi Jill,

I attempted a drush -d cc all and got a max_user_connections error. I have increased the max_user_connections to 149 which is just below the max_connections which is 150. I then ran drush -d cc all without receiving the error. So possibly this has resolved the cache clear problem.

I do think we're still on the edge of using all of the server resources, which is a problem. I'm wondering if you can go into your views and try adding caching to the views. I think even like a 5 minute views cache would help with the mysql queries.

Can you try that and see how it goes?

~/ross

comment:30 Changed 4 years ago by Jill Hubley

Hi Ross,

I implemented Views caching the day after you suggested it. I've been waiting to see how the site performs before getting back to you.

Overall, it's much better. However, I'm still occasionally hitting the error that states the site has exceeded the 'max_user_connections' resource. I'm not sure if there's anything further that can be done about this.

Thanks for all of your work on this! Jill

comment:31 Changed 4 years ago by Ross

Hi Jill,

So it appears that I updated the max connections for mysql globally, but not for the specific database user. I have gone ahead and increased the max connections for the database user to 100. Hopefully this will be enough.

Also, while I was doing this, I changed how your caching directory is mounted. I ran across this idea recently and it should dramatically increase page load performance for cached pages. What I did was mounted a tmpfs at /home/members/nacla/sites/web.nacla.org/web/cache. So now you'll be storing all of the cached pages directly in RAM, which should serve that content considerably faster than reading from disk.

Let me know if the database connection error crop up again,

~/ross

comment:32 Changed 4 years ago by Jill Hubley

Hi Ross,

Great! That all sounds awesome. I'll monitor the site over the next few days and let you know how it looks.

Best, Jill

comment:33 Changed 4 years ago by Jill Hubley

Hi Ross,

So, today I'm running into this error left and right:

PDOException: SQLSTATE[HY000] [2002] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in lock_may_be_available() (line 167 of /home/members/nacla/sites/web.nacla.org/web/includes/lock.inc).

Anila let me know she's getting them too. I think the site was actually running much more smoothly yesterday.

Thank you! Jill

comment:34 Changed 4 years ago by Ross

Hi Jill,

Yep, it looks like the increased number of mysql connections has caused mysql to run out of memory again. I've dropped the max connections down to 50 to see if that helps. I'm wondering if we might want to use memcached and see if that helps. It should reduce the load on mysql.

~/ross

comment:35 Changed 4 years ago by Jill Hubley

Hi Ross,

Great, thanks for that. I'll keep you posted. Memcached could be good, though I don't have too much experience with it, so I defer to your expertise.

Thanks! Jill

comment:36 Changed 4 years ago by Jill Hubley

Hi again,

Just an update-- I'm getting the PDOException: SQLSTATE[HY000]: General error: 2006 now constantly, even when I just try to navigate to a new node or add a product to the shopping cart.

Thank you! Jill

comment:37 Changed 4 years ago by Jill Hubley

I may have jumped the gun. My apologies for that. It's running more smoothly now. I'll give you another update later today. Thanks!

comment:38 Changed 4 years ago by Jill Hubley

Cc: anila@… colette@… added

Hi Ross,

Message from Colette below. The site is down again. Can you look into this?

Thanks! Jill

Hi all

I'm not sure if folks are already on this, but if not, letting you know that our entire site is down, and has been down since last night. I have already gotten a few emails about this.

Best, Colette

comment:39 Changed 4 years ago by Jamie McClelland

Sorry for the downtime, we should be back up and running now.

The problem is that the database crashed because the server ran out of memory, which seems to be happening a lot, probably as a result of increasing the number of database connections allowed.

I've just tried limiting things on the apache side (to just 40 clients) in an effort to stop apache for causing the crash (for example, in the event that google tries to index the site with too many requests).

I think Ross' idea to try memcached is a good one.

comment:40 Changed 4 years ago by Jill Hubley

Thank you so much for fixing this so promptly upon reporting of the error.

Is there anything we need to do on our end regarding getting memcached implemented?

Thanks! Jill

comment:41 Changed 4 years ago by Jill Hubley

Hi again,

It looks like the same thing happened. Getting the same error (PDOException: SQLSTATE[HY000] [2002] ).

When this happens, are you restarting mysql? Do I have permissions to do that when ssh'd into the server (as user naclaweb)? I'm guessing not, but just want to get the site up asap when this happens, so if there's anything I can do, let me know.

Thank you so much, Jill

comment:42 Changed 4 years ago by Jamie McClelland

Ug. So frustrating. Yes I just started mysql.

I'd like to take some time, probably on thursday, to more closely analyze why your site is running out of memory so we can figure out the best strategy. I think providing you with access to restart mysql is a good idea. To get started, you will need to generate a ssh private/public key pair. We have directions here: https://support.mayfirst.org/wiki/faq/public-private-key-generation

As for memcached - I just installed the packages: memcached and php5-memcache on our server.

We will need some assistance from your Drupal developer to install and configure the right pieces for Drupal to use it.

comment:43 Changed 4 years ago by Jill Hubley

Great, it would be excellent if you could do that analysis.

I added my public key to the server.

I'm actually the one who built the Drupal 7 site, so just let me know what I need to do.

Thank you, Jill

comment:44 Changed 4 years ago by Jamie McClelland

Hi Jill,

Excellent! You will need to install the memcache module. I've already installed the server parts - so you will only need to follow the directions starting with the "Put your site into offline mode." part under the Drupal 7.

kahlo currently has a running instance of memcached that is only listening on 127.0.0.1 and we have version 3.0.6 of php's pecl memcache module.

I just gave you root access on kahlo (you can now login with your key as the root user instead of the naclaweb user). That gives you total access to do anything (if you do take action as root - please open a new ticket or comment on an existing ticket).

I also installed sudo, and configured the naclaweb user to have permission to restart mysql.

So, if the problem is with mysql, it would be easier and safer to ssh in as the naclaweb user and type:

sudo /etc/init.d/mysql restart

To restart the database.

Let us know how it goes with memcache. I hope that will really help things.

comment:45 Changed 4 years ago by Jill Hubley

Hi Jamie,

Awesome, this is all great.

I'll get the memcache module installed later today, and will let you know how it goes then.

Thanks! Jill

comment:46 Changed 4 years ago by Jill Hubley

Hi,

I added memcache module. When I went to enable it, I got this error, "You must enable the PECL memcached or memcache extension to use memcache.inc." So I logged into the server as root and added extension=memcached.so to php.ini, then I restarted apache2. Then Drupal allowed me to enable the module.

So now it's set up, and we'll see how it goes! I'm going to leave this ticket open for now, and will give you an update on site performance in a few days.

Thanks very much, Jill

comment:47 Changed 4 years ago by Jill Hubley

Resolution: fixed
Status: assignedclosed

Hi again,

Since memcached was installed, I haven't run into the memory errors, and mysql has been running smoothly. So this has helped a great deal. I'm going to close this ticket.

Thanks so much for your help! Jill

comment:48 Changed 4 years ago by Jamie McClelland

Fantastic!! Thanks for the report back.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.