Opened 7 years ago

Closed 7 years ago

#5472 closed Bug/Something is broken (fixed)

Site slowness for leftforum

Reported by: Mark Libkuman Owned by: Jamie McClelland
Priority: High Component: Tech
Keywords: web-optimization Cc:
Sensitive: no

Description

Hi all,

So I was checking the leftforum site today and was getting page timeouts. The confusing thing to me is when i go to the server via ssh lfconf@… and run top, the server seems like its barely being hit at all. Any ideas on what could be causing this? The drupal cache setting are as high as it can be (accept for block caching which is disabled because content access functionality cannot not work with it)

cheers, Mark

Change History (15)

comment:1 Changed 7 years ago by Daniel Kahn Gillmor

Keywords: web-optimization added

Browsing http://www.leftforum.org/, I see a number of things that could be optimized to reduce the amount of time the client spends fetching data from the server:

  • a number of embedded images have a src that is absolute, in the form of http://leftforum.org/blah/blah/foo.png (note the lack of 'www'), but the site is configured to properly canonicalize all URLs to use www.leftforum.org as the hostname. Using a server-relative src URL (e.g. just /blah/blah/foo.png) would reduce the number of HTTP 302 redirects needed to fetch all the elements in the homepage.
  • your slideshow div appears to load all 8(?) of the images in one bundle. It does this at the top of the page, too, above most of the other content. So most of the time spent communicating with the server is going to be spent loading those images. If the rest of the page didn't use any other images, this wouldn't affect the rendering of the page, but since you're using images for section headers and buttons, i suspect that they're getting queued up behind the slideshow. Moving the slideshow to the end of the HTML (but placing it at the top of the page using CSS), or including only one image, and using javascript to load the other images once the rest of the page has rendered might improve the user experience.
  • one of the slideshow images in particular (http://www.leftforum.org/sites/all/themes/danland/images/mailings/wisconsin-panel-2011-pictures-fuzzy.JPG) is enormous, and consumes 1.4MiB, even though it will be constrained to being rendered in a 432x288px space. resizing this image on the server side would probably reduce a lot of unneeded traffic.
  • The use of images as section headers consumes bandwidth and ties up active connections to the server that could be used for serving other clients; the lack of at least alt text for these section headers also means that the experience is suboptimal for visually-impaired visitors. Switching styled, textual section headers would eliminate both of these problems.
  • quite a bit of the HTML content in the page itself seems superfluous, and could be trimmed out (e.g. unnecessary html comments, nested spans that provide no additional distinguishing characteristics to the structure of the text). Trimming/cleaning the HTML would save you a bit of bandwidth, but it's probably in the noise compared to the other improvements recommended above.

The above is from a view of the page with javascript disabled; i haven't investigated other additional delays that might come from using a javascript-enabled client.

Last edited 7 years ago by Daniel Kahn Gillmor (previous) (diff)

comment:2 Changed 7 years ago by Daniel Kahn Gillmor

fwiw, most of the above sleuthing can be uncovered with any reasonable version of iceweasel or firefox.

Browsing to the web page you're looking to investigate, you can right-click the background of the page, and choose "View Page Info". This will bring up a dialog box that has a "Media" tab on it, which you can scroll through to see the sizes of each of the embedded images and scripts.

For versions of iceweasel/firefox ≥ 10.0, you should also have an "inspect element" option in your context menu, which you can use to inspect the HTML of the page at any point.

You can also install the firebug extension, which has a "net" tab that shows you the full list of network requests (and their timings) made for any given pageload.

Visit the page you're interested in, open the firebug panel, switch to (and enable) the "net" tab, and reload the page with ctrl-shift-R (to bypass the local cache and see what a new visitor to the site will see over the network)

Using these tools should make troubleshooting this kind of issue easier in the future. Try them out now on a page that *isn't* in trouble, so you can see what to expect!

comment:3 Changed 7 years ago by Jamie McClelland

I've taken the additional step of turning swap off with:

swapoff -a

And commented out the swap line in /etc/fstab

If we run out of memory we want the kernel to try to kill processes using a lot of memory (which will ensure the machine is still reachable by us so we can do maintenance) rather than swapping memory to disk, which can cause massive disk I/O contention, which can make the guest (and all other guests on the shared host) unresponsive for long periods of time.

comment:4 Changed 7 years ago by Jamie McClelland

See #4875.

comment:5 Changed 7 years ago by Ross

Owner: set to Jamie McClelland
Status: newassigned

comment:6 Changed 7 years ago by Jamie McClelland

gdl and I noticed occasional errors on the site indicating max mysql connection limit was being reached. I just boosted it from 25 to 99. The main purpose of the limit is to keep one site from hogging all connections, but since this is a dedicated mysql server, that should not be a problem.

comment:7 Changed 7 years ago by Mark Libkuman

Server load has now spiked to 35

top - 18:16:38 up 13 days, 7:41, 3 users, load average: 34.16, 8.93, 3.69 Tasks: 245 total, 63 running, 182 sleeping, 0 stopped, 0 zombie Cpu(s): 51.9%us, 40.1%sy, 0.0%ni, 0.0%id, 7.6%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 4060724k total, 4034724k used, 26000k free, 416k buffers Swap: 0k total, 0k used, 0k free, 74648k cached

comment:8 Changed 7 years ago by Jamie McClelland

[1151040.796268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1151040.798835] INFO: task apache2:4559 blocked for more than 120 seconds.
[1151040.800836] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1151040.803282] INFO: task apache2:4598 blocked for more than 120 seconds.
[1151040.805280] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1151040.807739] INFO: task apache2:4599 blocked for more than 120 seconds.
[1151040.809740] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1151040.812179] INFO: task apache2:4617 blocked for more than 120 seconds.
[1151040.814174] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1151040.816715] INFO: task apache2:4618 blocked for more than 120 seconds.
[1151040.818779] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1151040.821226] INFO: task apache2:4619 blocked for more than 120 seconds.
[1151040.823230] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1151040.825685] INFO: task apache2:4620 blocked for more than 120 seconds.
[1151040.827649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

comment:9 Changed 7 years ago by Jamie McClelland

Rebooted as via {{{root@…}]} with sv hup marx.

comment:10 Changed 7 years ago by Jamie McClelland

Looks like it ran out of memory:

0 marx:~# grep -i killer /var/log/syslog
Mar 16 18:16:36 marx kernel: [1150883.544702] apache2 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Mar 16 18:16:38 marx kernel: [1150885.168443] apache2 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Mar 16 18:16:39 marx kernel: [1150886.004765] wc invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Mar 16 18:16:39 marx kernel: [1150886.878716] apache2 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Mar 16 18:16:40 marx kernel: [1150887.440677] apache2 invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0
Mar 16 18:16:41 marx kernel: [1150888.681495] apache2 invoked oom-killer: gfp_mask=0x280da, order=0, oom_Mar 16 18:33:33 marx kernel: imklog 4.6.4, log source = /proc/kmsg started.
0 marx:~#

comment:11 Changed 7 years ago by Jamie McClelland

I've shutdown all the mail-related daemons which aren't in use (courier, spamassassin, clamsmtp, postfix).

And, I propose that late tonight we lower RAM on boggs from 6 GB to 4 GB and increase marx from 4 to 6GB. Since vilma is close to maxed out on RAM (28.5GB out of 32 allocated) I'm hesitant to use up any of the spare ram.

comment:12 Changed 7 years ago by Daniel Kahn Gillmor

Why aren't we allocating the spare ram from the host? We're talking about several GiB here. The host's job is just to run the guests; the guests take up the RAM. Why are we leaving this critical resource unallocated if we have member services which need it?

comment:13 Changed 7 years ago by Daniel Kahn Gillmor

#5473 seems like it's really about the same issue as here.

On that ticket, ross wrote:

As per jamie and my discussion, I added 2G of RAM to marx and removed 2G of ram from tresca. We should be able to revert this RAM allocation after the Left Forum finishes.

I also turned marx's swap back on, due to a theory that the latest crash at 6:15pm 3/15/2012 was due to a lack of swap.

Last edited 7 years ago by Daniel Kahn Gillmor (previous) (diff)

comment:14 Changed 7 years ago by Jamie McClelland

The only reason not to use spare RAM on vilma was conservatism and uncertainty about how much RAM a host needs. It seems like it should be minimal, the but the consequences of a mistake are high.

In any event, tresca (who we borrowed RAM from) is currently having out of memory errors. I'm giving tresca an extra GB and restarting now.

jamie

comment:15 Changed 7 years ago by Jamie McClelland

Resolution: fixed
Status: assignedclosed

I've transferred leftforum.org back to albizu, and reduced the RAM on marx to 2 GB.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.