apache stuck and child processes fail to start

Bug #1988224 reported by Daniel
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apache2 (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Medium
Bryce Harrington

Bug Description

[Impact]
Starting with Apache 2.4.51 the child processes for Apache fail to start after a period of time or after so many requests. The issue became much more frequent with 2.4.52 due to fixes in accounting of active_daemons.

Apache will stop accepting new connection until the parent Apache process is eventually restarted. Requests for new web pages will just hang. In netstat there are many CLOSE_WAIT and ESTABLISHED entries.

Seems to be affecting mostly servers using the event MPM.
Reverting to focal's Apache 2.4.41 or moving to kinetic's 2.4.53 resolves it.

[Test Case]
$ lxc launch ubuntu:jammy apache2-sru-lp1988224-jammy --vm
$ lxc shell apache2-sru-lp1988224-jammy
# apt update && apt dist-upgrade -y
# apt install apache2 lynx -y
# cat > /etc/apache2/mods-enabled/mpm_event.conf << __EOF__
<IfModule mpm_event_module>
    StartServers 1
    MinSpareThreads 1
    MaxSpareThreads 1
    ThreadsPerChild 1
    MaxRequestWorkers 1
    MaxConnectionsPerChild 1
</IfModule>
__EOF__
# systemctl restart apache2
# while lynx -dump -read_timeout=10 localhost/server-status; do continue; done
... This command should never return, but it will crash eventually ...

Example failed output:

root@apache2-sru-lp1988224-jammy:~# time (while lynx -dump -read_timeout=10 localhost/server-status; do continue; done)

Looking up localhost
Making HTTP connection to localhost
Sending HTTP request.
HTTP request sent; waiting for response.
Alert!: Socket read failed (too many tries).
Connection interrupted.

lynx: Can't access startfile http://localhost/server-status

real 0m10.082s
user 0m0.044s
sys 0m0.011s

Example of successful output:

   Srv PID Acc M CPU SS Req Dur Conn Child Slot Client Protocol VHost
   Request
   0-0 5891 1/0/36 W 0.00 0 0 25 0.0 0.00 0.14 127.0.0.1 http/1.1
   apache2-sru-lp1988224-jammy.lxd GET /server-status HTTP/1.0
   1-0 - 0/0/36 . 0.00 0 0 26 0.0 0.00 0.14 127.0.0.1 http/1.1
   apache2-sru-lp1988224-jammy.lxd GET /server-status HTTP/1.0
... (continuous output) ...

[Where Problems Could Occur]
The included patches involve changes to connection behavior, so it would be worth watching for reports of misbehaviors relating to client processes such as not loading pages from the webserver.

The patches change C code, so the usual sorts of regression risks apply. These don't change memory management or pointer behavior, so its less likely that regressions would involve memory leaks or invalid pointers, but more likely to be loop misbehaviors (getting stuck, skipping an operation or doing it too many times, etc.)

The upstream codebase does not have further refinements to these patches in particular, but there are subsequent patches to the files in question for unrelated issues. It doesn't look like we require any of those for this problem, but it's conceivable something is overlooked. If this is true, then obviously look for problem reports that match description of one of those upstream commits.

[Original Report]
Since updating the LTS server version from 20.04 to 22.04, I've had problems with apache 2.4.52 (2.4.52-1ubuntu4.1) in mpm_event mode. The child processes for apache fail to start after a period of time. The webserver is unreachable. In netstat there are many CLOSE_WAIT and ESTABLISHED entries.

The error_log says:

[Tue Aug 30 12:59:38.451188 2022] [http2:warn] [pid 687247:tid 139925644072832] AH10291: h2_workers: cleanup, 1 idle workers did not exit after 5 seconds.

# ps xau |grep apache
root 899 0.0 0.4 86712 40116 ? Ss Aug25 0:33 /usr/sbin/apache2 -k start
www-data 901 0.0 0.0 3736 156 ? Ss Aug25 0:15 /usr/bin/htcacheclean -d 120 -p /var/cache/apache2/mod_cache_disk -l 300M -n
www-data 687242 0.0 0.3 87020 30104 ? S 02:00 0:00 /usr/sbin/apache2 -k start

Other processes are gone.

The problem is known and already fixed in 2.4.53, see: https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

I haven't found anything that this problem has already been fixed in the ubuntu version of apache. That's why I'm making this bug report.

Many Thanks.

Related branches

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the bug report.

I tried reproducing it here using the configuration file outlined in https://bz.apache.org/bugzilla/show_bug.cgi?id=65769#c1, but as far as I have checked things are still working. Would you have a reproducer that I can use to make sure that we're dealing with aforementioned upstream issue?

Thanks.

Changed in apache2 (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel (danuntu) wrote (last edit ):
Download full text (12.6 KiB)

I think I can reproduce this on my server. I changed the apache config as follows and called lynx a few times. After a few successful lynxes, the server no longer responds. What can i do to debug more?

apache config:

<IfModule mpm_event_module>
    StartServers 1
    MinSpareThreads 1
    MaxSpareThreads 1
    ThreadsPerChild 1
    MaxRequestWorkers 1
    MaxConnectionsPerChild 1
</IfModule>

# systemctl restart apache2.service
# tail -f /var/log/apache2/error.log

[Wed Aug 31 22:45:54.390886 2022] [mpm_event:notice] [pid 998699:tid 139894640904064] AH00492: caught SIGWINCH, shutting down gracefully
[Wed Aug 31 22:45:54.649967 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity for Apache/2.9.5 (http://www.modsecurity.org/) configured.
[Wed Aug 31 22:45:54.650003 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: APR compiled version="1.7.0"; loaded version="1.7.0"
[Wed Aug 31 22:45:54.650016 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: PCRE compiled version="8.39 "; loaded version="8.39 2016-06-14"
[Wed Aug 31 22:45:54.650023 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: LUA compiled version="Lua 5.1"
[Wed Aug 31 22:45:54.650029 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: YAJL compiled version="2.1.0"
[Wed Aug 31 22:45:54.650035 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: LIBXML compiled version="2.9.12"
[Wed Aug 31 22:45:54.650094 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: StatusEngine call: "2.9.5,Apache/2.4.52 (Ubuntu),1.7.0/1.7.0,8.39/8.39 2016-06-14,Lua 5.1,2.9.12,e0c86efba80afb51d5e1caae78c38635f2f8b5df"
[Wed Aug 31 22:46:02.727769 2022] [suexec:notice] [pid 1000585:tid 140120599648128] AH01232: suEXEC mechanism enabled (wrapper: /usr/lib/apache2/suexec)
[Wed Aug 31 22:46:02.982447 2022] [ssl:warn] [pid 1000591:tid 140120599648128] AH01909: broeltal.de:443:0 server certificate does NOT include an ID which matches the server name
[Wed Aug 31 22:46:03.025496 2022] [mpm_event:notice] [pid 1000591:tid 140120599648128] AH00489: Apache/2.4.52 (Ubuntu) OpenSSL/3.0.2 mod_fcgid/2.3.9 configured -- resuming normal operations
[Wed Aug 31 22:46:03.025617 2022] [core:notice] [pid 1000591:tid 140120599648128] AH00094: Command line: '/usr/sbin/apache2'

after 3 successfully lynx commands:
[Wed Aug 31 22:46:18.046440 2022] [mpm_event:error] [pid 1000591:tid 140120599648128] AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

# ps xaufe
root 1000591 0.5 0.2 69388 18392 ? Ss 22:46 0:00 /usr/sbin/apache2 -k start
root 1000593 0.0 0.0 3088 1164 ? S 22:46 0:00 \_ /usr/bin/rotatelogs -l /www/foo1_log 3600
root 1000594 0.0 0.0 3088 1136 ? S 22:46 0:00 \_ /usr/bin/rotatelogs -l /www/foo2_log 1800
www-data 1000597 0.0 0.1 69332 13424 ? S 22:46 0:00 \_ /usr/sbin/apache2 -k start
www-data 1000644 0.0 0.1 233620 14836 ? Sl 22:46 0:00 \_ /usr/sbin/apache2 -k start

$ lynx -dump -read_timeout=10 localhost/server-status
               Apache Server Status for localhost (...

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hmm, OK, for some reason the bug doesn't reproduce for me if I use a Jammy LXD container. I've just tried reproducing it using a VM and now I can trigger the error.

Thanks for the feedback. I'm marking this bug as Triaged and adding the server-todo tag; someone from the team should work on it soon (unless you would like to drive the SRU yourself, of course!).

Thanks.

tags: added: server-todo
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Instructions on how to reproduce:

$ lxc launch ubuntu:jammy test-apache2 --vm
$ lxc shell test-apache2
# apt update && apt dist-upgrade -y
# apt install apache2 lynx -y
# cat > /etc/apache2/mods-enabled/mpm_event.conf << __EOF__
<IfModule mpm_event_module>
    StartServers 1
    MinSpareThreads 1
    MaxSpareThreads 1
    ThreadsPerChild 1
    MaxRequestWorkers 1
    MaxConnectionsPerChild 1
</IfModule>
__EOF__
# systemctl restart apache2
# while lynx -dump -read_timeout=10 localhost/server-status; do continue; done
... This command should never return, but it will crash eventually ...

Changed in apache2 (Ubuntu Jammy):
status: New → Triaged
importance: Undecided → Medium
Changed in apache2 (Ubuntu):
status: Incomplete → Fix Released
Changed in apache2 (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Bryce Harrington (bryce)
Changed in apache2 (Ubuntu Jammy):
assignee: Sergio Durigan Junior (sergiodj) → Bryce Harrington (bryce)
Revision history for this message
Bryce Harrington (bryce) wrote :

From the upstream bug report, do I understand correctly that the solution for this is these two changesets?

    https://svn.apache.org/viewvc?view=revision&revision=1897149
    https://svn.apache.org/viewvc?view=revision&revision=1901234

There are also some intervening merge changesets (r1901199 and r1898467) that each backport a bunch more revisions, but it looks like those are not required for this?

Also, it sounds like this doesn't affect focal or earlier, and is already fixed in kinetic's apache2, so we only need to SRU jammy, correct?

Bryce Harrington (bryce)
description: updated
Bryce Harrington (bryce)
description: updated
Bryce Harrington (bryce)
description: updated
Revision history for this message
Daniel (danuntu) wrote :

Yes, this affect only jammy. Kinetic's apache2 is version 2.4.54. The apache bug only affects version 2.4.52. The apache stucks almost every day on my shared web hosting machines.

Revision history for this message
Bryce Harrington (bryce) wrote :

I've posted a PPA with the two patches here for testing:

    https://launchpad.net/~bryce/+archive/ubuntu/apache2-sru-lp1988224/

One of the perl-framework tests failed but I've also pulled a fix for that from Debian.

description: updated
Bryce Harrington (bryce)
Changed in apache2 (Ubuntu Jammy):
status: Triaged → In Progress
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Daniel, or anyone else affected,

Accepted apache2 into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/apache2/2.4.52-1ubuntu4.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in apache2 (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (apache2/2.4.52-1ubuntu4.2)

All autopkgtests for the newly accepted apache2 (2.4.52-1ubuntu4.2) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.48.2-0ubuntu1 (arm64)
libsoup3/3.0.7-0ubuntu1 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#apache2

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Bryce Harrington (bryce) wrote :

The referenced update_excuses.html page for jammy lists apache2 as "BLOCKED: Needs an approval" since 10-17. There's no indication about the two test issues mentioned in comment #9.

Fwiw, apache2 has always suffered from flaky autopkgtests, and it's almost certain that's what the gvfs and libsoup3 failures are. If anyone can, just retrigger the tests once or twice and they should go through. IIRC the flakiness can be caused by network/proxy issues particular to the hardware config of the testing machinery.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote : Re: [Bug 1988224] Re: apache stuck and child processes fail to start

On Wednesday, October 19 2022, Bryce Harrington wrote:

> The referenced update_excuses.html page for jammy lists apache2 as
> "BLOCKED: Needs an approval" since 10-17. There's no indication about
> the two test issues mentioned in comment #9.
>
> Fwiw, apache2 has always suffered from flaky autopkgtests, and it's
> almost certain that's what the gvfs and libsoup3 failures are. If
> anyone can, just retrigger the tests once or twice and they should go
> through. IIRC the flakiness can be caused by network/proxy issues
> particular to the hardware config of the testing machinery.

I retriggered both failures earlier this week. Everything should be
ready for the package to migrate once the grace period is over.

Cheers,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Bryce Harrington (bryce) wrote :

Ah thanks Sergio!

Revision history for this message
Bryce Harrington (bryce) wrote :

Verified the fix works:

$ lxc launch ubuntu:jammy apache2-sru-lp1988224-jammy-2 --vm
Creating apache2-sru-lp1988224-jammy-2
Starting apache2-sru-lp1988224-jammy-2

$ lxc shell apache2-sru-lp1988224-jammy-2

# cat >> /etc/apt/sources.list << __EOF__
> deb http://archive.ubuntu.com/ubuntu jammy-proposed main universe
> __EOF__

root@apache2-sru-lp1988224-jammy-2:~# apt update && apt dist-upgrade -y
...
...

# apt update && apt dist-upgrade -y

# apt install apache2 lynx -y
...

# cat > /etc/apache2/mods-enabled/mpm_event.conf << __EOF__

<IfModule mpm_event_module>
    StartServers 1
    MinSpareThreads 1
    MaxSpareThreads 1
    ThreadsPerChild 1
    MaxRequestWorkers 1
    MaxConnectionsPerChild 1
</IfModule>
__EOF__

# systemctl restart apache2

# systemctl status apache2.service

# while lynx -dump -read_timeout=10 localhost/server-status; do continue; done

I left this to run for about half an hour, no problem.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apache2 - 2.4.52-1ubuntu4.2

---------------
apache2 (2.4.52-1ubuntu4.2) jammy; urgency=medium

  * d/p/fix-a-possible-listener-deadlock.patch,
    d/p/handle-children-killed-pathologically.patch: Fix situation
    where Apache fails to start its child processes after a certain
    number of requests, causing requests for new pages to hang.
    (LP: #1988224)
  * d/perl-framework/t/ssl/ocsp.t: Update test framework
    - Cherry pick from Debian 2.4.53-1

 -- Bryce Harrington <email address hidden> Thu, 29 Sep 2022 21:09:50 -0700

Changed in apache2 (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for apache2 has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.