status_of_proc is returning incorrect error code

Bug #683640 reported by Psi-Jack
62
This bug affects 7 people
Affects Status Importance Assigned to Milestone
lsb (Ubuntu)
Fix Released
Medium
Unassigned
Lucid
Fix Released
Medium
Unassigned
Natty
Won't Fix
Undecided
Unassigned
Oneiric
Won't Fix
Undecided
Unassigned
Precise
Fix Released
High
Adam Stokes
Quantal
Fix Released
Medium
Unassigned

Bug Description

Binary package hint: lsb

[Impact]
When status_of_proc is called with -p $PIDFILE and the service is not presently started, error code 4 is always returned because the process does not have the PID file existant until it is started, and it's deleted after stopping.

This fails LSB compliance because it should report 0 if the service is stopped as expected.

[Test Case]
status_of_proc should /optionally/ check if a pid file exists or not, not make it mandatory to exist before throwing an error code. If pid file does exist, check it, verify it's associated with $DAEMON, and if not try to determine if the process is actually running under a different pid, if not, then return error code 0 because it is determined that it is /not/ running.

Error code 4 in LSB is reserved for total permanent failure that requires manual intervention to correct. Such as, the process daemon isn't executable due to permissions, or it tried to start but fails.. error code 4 is likely never used on checking status is very basic. Either it's running, or it's not. That's all status_of_proc should report, nothing more, nothing less.

[Regression]
Minimal, changes the behavior of checking against a pid file existance.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: lsb-base 4.0-0ubuntu8
ProcVersionSignature: Ubuntu 2.6.32-26.48-server 2.6.32.24+drm33.11
Uname: Linux 2.6.32-26-server x86_64
Architecture: amd64
Date: Wed Dec 1 08:27:21 2010
InstallationMedia: Ubuntu-Server 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.2)
PackageArchitecture: all
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: lsb

Revision history for this message
Psi-Jack (erenfro) wrote :
Revision history for this message
Psi-Jack (erenfro) wrote :

As per LSB Init compliancy:

If the status action is requested, the init script will return the following exit status codes.

0 program is running or service is OK
1 program is dead and /var/run pid file exists
2 program is dead and /var/lock lock file exists
3 program is not running
4 program or service status is unknown
5-99 reserved for future LSB use
100-149 reserved for distribution use
150-199 reserved for application use
200-254 reserved

Retrieved from: http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

Revision history for this message
Psi-Jack (erenfro) wrote :
Revision history for this message
Psi-Jack (erenfro) wrote :

This patch, applied to /lib/lsb/init-functions resolves the problem by:
1> Allowing pid detection to check directly by pid if the process is in fact running.
2> Allowing pid detection to check for stale pid files.
3> Failing over to pidof detection to find the parent pid (ppid) of the process if it is running.

--- init-functions.orig 2010-12-01 09:18:54.219049251 -0500
+++ init-functions 2010-12-01 09:19:29.779059526 -0500
@@ -67,7 +67,7 @@
     OPTIND=1
     while getopts p: opt ; do
         case "$opt" in
- p) pidfile="$OPTARG"; specified=1;;
+ p) pidfile="$OPTARG";;
         esac
     done
     shift $(($OPTIND - 1))
@@ -91,7 +91,7 @@
             fi
         fi
     fi
- if [ -x /bin/pidof -a ! "$specified" ]; then
+ if [ -x /bin/pidof ]; then
         status="0"
         /bin/pidof -o %PPID -x $1 || status="$?"
         if [ "$status" = 1 ]; then

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug and helping to make Ubuntu better. It seems to me that
/init/lsb/init-functions belongs to the lsb-base package, so if this is a bug, it should
be filed against that package.

Changed in bind9 (Ubuntu):
status: New → Invalid
Changed in dovecot (Ubuntu):
status: New → Invalid
Changed in openbsd-inetd (Ubuntu):
status: New → Invalid
Changed in openvpn (Ubuntu):
status: New → Invalid
Changed in rsync (Ubuntu):
status: New → Invalid
Changed in spamassassin (Ubuntu):
status: New → Invalid
Revision history for this message
Psi-Jack (erenfro) wrote :

Well, according to ubuntu-bug -p lsb-base, it filed it on lsb because lsb is the /source/ package. And why did you mark all the other packages I had associated it to as invalid? They are /ALL/ effected by this exact same thing, which is how I found it in the first place.

Revision history for this message
C de-Avillez (hggdh2) wrote :

@Psi-Jack: they are affected by the error on status_of_proc, but there is nothing to fix for them.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Raising to Medium importance.

@Psi-Jack: can you please add your patch as an attachment?

Changed in lsb (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Chad Netzer (chad-netzer) wrote :

In order to get my Pacemaker/Corosync lsb resources working correctly, I applied a modified version of Psi-Jack's patch. I'm not sure why he removed the setting of the "specified" local var, but I left that in because I have resources that specify their own path to a pidfile. However, I unconditionally used the pidof check, which has the intended effect of returning 3 (instead of 4) when a resource/daemon is not running.

Here is the patch I'm using (I'll attach it as well):

--- /lib/lsb/init-functions.orig 2009-09-10 04:27:42.000000000 -0700
+++ /lib/lsb/init-functions 2011-04-11 17:59:22.865842039 -0700
@@ -91,7 +91,7 @@
             fi
         fi
     fi
- if [ -x /bin/pidof -a ! "$specified" ]; then
+ if [ -x /bin/pidof ]; then
         status="0"
         /bin/pidof -o %PPID -x $1 || status="$?"
         if [ "$status" = 1 ]; then

tags: added: patch
Revision history for this message
ServerAlex (serveralex) wrote :

also affects lucid. please fix!

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lsb (Ubuntu):
status: New → Confirmed
Revision history for this message
Nicola (nicola) wrote :

same in oneiric, corosync is in main and cannot use init script as lsb resource, please fix!

Changed in lsb (Ubuntu):
milestone: none → ubuntu-12.04.1
Chris J Arges (arges)
Changed in bind9 (Ubuntu Lucid):
status: New → Invalid
no longer affects: bind9 (Ubuntu Lucid)
Chris J Arges (arges)
Changed in bind9 (Ubuntu Natty):
status: New → Invalid
no longer affects: bind9 (Ubuntu Natty)
no longer affects: bind9 (Ubuntu Oneiric)
no longer affects: bind9 (Ubuntu Precise)
no longer affects: bind9 (Ubuntu Quantal)
Chris J Arges (arges)
Changed in lsb (Ubuntu Quantal):
milestone: ubuntu-12.04.1 → none
no longer affects: dovecot (Ubuntu Lucid)
Chris J Arges (arges)
Changed in lsb (Ubuntu Precise):
milestone: none → ubuntu-12.04.1
status: New → Confirmed
importance: Undecided → Medium
no longer affects: dovecot (Ubuntu Natty)
no longer affects: dovecot (Ubuntu Oneiric)
no longer affects: dovecot (Ubuntu Precise)
no longer affects: dovecot (Ubuntu Quantal)
Chris J Arges (arges)
no longer affects: spamassassin (Ubuntu)
no longer affects: openbsd-inetd (Ubuntu Lucid)
no longer affects: openbsd-inetd (Ubuntu Natty)
Chris J Arges (arges)
no longer affects: spamassassin (Ubuntu Lucid)
no longer affects: openbsd-inetd (Ubuntu Oneiric)
Chris J Arges (arges)
no longer affects: spamassassin (Ubuntu Natty)
no longer affects: openbsd-inetd (Ubuntu Precise)
Chris J Arges (arges)
no longer affects: spamassassin (Ubuntu Oneiric)
no longer affects: openbsd-inetd (Ubuntu Quantal)
no longer affects: openvpn (Ubuntu Lucid)
no longer affects: openvpn (Ubuntu Natty)
Chris J Arges (arges)
Changed in lsb (Ubuntu Lucid):
importance: Undecided → Medium
status: New → Confirmed
no longer affects: spamassassin (Ubuntu Quantal)
no longer affects: spamassassin (Ubuntu Precise)
no longer affects: openvpn (Ubuntu Oneiric)
no longer affects: openvpn (Ubuntu Precise)
no longer affects: openvpn (Ubuntu Quantal)
no longer affects: rsync (Ubuntu Lucid)
no longer affects: rsync (Ubuntu Natty)
no longer affects: rsync (Ubuntu Oneiric)
no longer affects: rsync (Ubuntu Precise)
no longer affects: rsync (Ubuntu Quantal)
Chris J Arges (arges)
no longer affects: rsync (Ubuntu)
no longer affects: openvpn (Ubuntu)
no longer affects: bind9 (Ubuntu)
no longer affects: dovecot (Ubuntu)
no longer affects: openbsd-inetd (Ubuntu)
Steve Langasek (vorlon)
tags: added: rls-q-notfixing
Revision history for this message
Stéphane Graber (stgraber) wrote :

Adam: are you working on this one? will it make it to the point release?

Revision history for this message
Adam Stokes (adam-stokes) wrote :

This won't make .1 and I'll flag it for .2

Changed in lsb (Ubuntu Precise):
milestone: ubuntu-12.04.1 → ubuntu-12.04.2
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lsb - 4.0-0ubuntu23

---------------
lsb (4.0-0ubuntu23) quantal; urgency=low

  * Fix status_of_proc returning incorrect error code (LP: #683640)
 -- Adam Stokes <email address hidden> Tue, 24 Jul 2012 13:26:54 -0700

Changed in lsb (Ubuntu Quantal):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lsb (Ubuntu Natty):
status: New → Confirmed
Changed in lsb (Ubuntu Oneiric):
status: New → Confirmed
Changed in lsb (Ubuntu Precise):
assignee: nobody → Adam Stokes (adam-stokes)
status: Confirmed → In Progress
Changed in lsb (Ubuntu Precise):
importance: Medium → High
description: updated
Revision history for this message
Martin Pitt (pitti) wrote :

Sponsored to precise-proposed, unsubscribing sponsors.

Changed in lsb (Ubuntu Lucid):
status: Confirmed → Won't Fix
Changed in lsb (Ubuntu Natty):
status: Confirmed → Won't Fix
Changed in lsb (Ubuntu Oneiric):
status: Confirmed → Won't Fix
Changed in lsb (Ubuntu Lucid):
status: Won't Fix → Confirmed
Revision history for this message
Adam Stokes (adam-stokes) wrote :

Here is some test output running with latest patched lsb from comment #20

running processes: 0
pidfiles: 0
/var/run/simplebin.pid /usr/sbin/simplebin simplebin
returncode pidofproc: 3
 * simplebin is not running
returncode status_of_proc: 3

This latest code *should* do what is expected.

Thanks
Adam

Revision history for this message
Adam Stokes (adam-stokes) wrote :
Revision history for this message
Adam Conrad (adconrad) wrote :

Upload sponsored with a slightly tweaked changelog.

Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello Psi-Jack, or anyone else affected,

Accepted lsb into lucid-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/lsb/4.0-0ubuntu8.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in lsb (Ubuntu Lucid):
status: Confirmed → Fix Committed
tags: added: verification-needed
Changed in lsb (Ubuntu Precise):
status: In Progress → Fix Committed
Revision history for this message
Adam Conrad (adconrad) wrote :

Hello Psi-Jack, or anyone else affected,

Accepted lsb into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/lsb/4.0-0ubuntu20.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Psi-Jack (erenfro) wrote :

Adam,

I looked at the patch #22, and though logically it's "accurate", but it's not at the same time. If the pid file exists but doesn't have the correct information, or any information, pidof should be failed back to check if it literally is running or not directly, not just up and return error code 3 on a whim. This method is prone to failure as-is and doesn't fully fix the problem.

The return of error code 4 for it the pid file exists, but isn't readable due to permissions, that is definitely acceptable because that would more than likely cause a real problem, especially if the process maintains it's own pid file.

Basically this portion should not be there:

+ else
+ return 3 # pidfile specified, but contains no PID to test
+ fi

So it would fall back to normal pidof checking.

Revision history for this message
Psi-Jack (erenfro) wrote :

Also noticed:

+ if [ -e "$pidfile" -a ! -r "$pidfile" ]; then

Should be -f, then -r, not -e. It should exist AND be a regular file (not a symlink, device node, pipe, etc)

Revision history for this message
Adam Conrad (adconrad) wrote :

Psi-Jack, you're misreading LSB, and also not considering daemons that can change their names on exec, etc. If we specify a pidfile, that's ALL we should be checking. 0, 1, 3, and 4 are clearly defined returns for this:

0: we have a file, it has a PID, the daemon is running.
1: the file exists, has a PID, but the daemon is dead.
3: the daemon isn't running.
4: we can't determine how we can tell you what's going on (and the case of an unreadable file fits here)

I see no reason why it would need to be a regular file, and this is the exact check we perform earlier, but inverted.

At any rate, I suspect the place we disagree is on your "If the pid file exists but doesn't have the correct information, or any information, pidof should be failed back to check if it literally is running or not directly", and we'll just have to agree to disagree there. If you specify a pidfile, that's what should be checked, period. Falling back to pidof means we're looking for OTHER processes that have nothing to do with the pidfile referenced (chroots, processes started by other users, there are tons of reasons for /bin/foo to be running on a system but not be YOUR /bin/foo).

Revision history for this message
Psi-Jack (erenfro) wrote :

No, actually, I'm following the LSB standards to the T.

You are correct in the fact, a process COULD be run under a different name, and for those cases, status_of_proc should be able to be alerted to that name, such as -n "processname", but pid files being weighted more authority than an actual process list check is very much inaccurate and incorrect to check the actual status of a process to whether it's running or not.

The very point of status is to check whether a process is running or not running. If you go strictly by a pid file if specified, and that program dies and doesn't clean up it's mess of metadata files (which a pid file is technically a metadata file), and you only check that, then you are simply wrong. pid files barely even part of /the/ standards as shown:

If the status action is requested, the init script will return the following exit status codes.

0 program is running or service is OK
1 program is dead and /var/run pid file exists
2 program is dead and /var/lock lock file exists
3 program is not running
4 program or service status is unknown
5-99 reserved for future LSB use
100-149 reserved for distribution use
150-199 reserved for application use
200-254 reserved

Reference URL: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

This is basically stating: If a pid file exists, check it, if it's running and the pid file exists and is accurate as the program expected is running, return 0.

HOWEVER,
If the pid file is missing or doesn't exist, check if the process is running by name or by binary name, if so, it's still running. Return 0
If the pid file exists and is NOT actually running, return 1

That's the root of the problem, misunderstanding of LSB init's standards, all of which you can easily and quickly get proper clarification from their website and from Freenode's #lsb channel which I have already done all the research and legwork into and why I continue to say you are wrong in this band-aid incorrect fix.

Revision history for this message
Jeff Licquia (jeff-licquia) wrote :

For the record, #lsb is not as unanimous as has been implied.

Revision history for this message
Psi-Jack (erenfro) wrote :

Here's a run-down of the overall problem I'm seeing about how this is being done, implicitly relying on pidfiles:

Say you're running apache by init.d scripts (which Ubuntu does), and it uses status_of_proc to get the status:

status_of_proc -p /var/run/apache.pid $DAEMON $NAME

status_of_proc passes on it's arguments to pidofproc to verify it, reading -p's contents into $pid and attempts kill -0 $pid, if that succeeds, return code 0; failing that, ps "$pid", if success, return code 0, else return code 1.

Problem 1: It's checking the most unreliable piece of information possible. The PID a metadata file reports. Say Apache doesn't maintain it's own pid file, this could be reminents of an data that was never garbage collected.

Problem 2: PID file exists, is readable, kill -0 $pid succeeds, but the actual process is bash. Someone had logged in and obtained the same pid as the pid file mentioned. Oh, it's running just fine! Wrong.

status_of_proc takes two arguments, mandatory, which ultimately ends up being $DAEMON and $NAME, $NAME strictly used for log/visual representation, and $DAEMON being the process name to check for, yet $DAEMON is very seldomly used in any kind of verification of a process actually running. Especially when a pid file is provided in your current situation.

It's an implementation detail that is being ignored.

The 1 and only 1 piece of information that is most critical in checking if a process is running is one of the two possibilities:
1) The executable binary full path.
2) The process's name, in case it's forked into and parent closed, or the process sets it's processname, etc.

In the case of either of those 2, you can 100% guarantee a process is or is not running through standard common sense when writing the init script, to know how the daemon starts and what it's doing when it starts.

This is why I suggested another getopts option, -n "processname", or it could simply be taken from the first argument's $DAEMON value either way. But, verification that the pid matches the process should be part of the process, and if not, the pidfile is stale and fallback to checking if the process itself is actually running or not by it's name, and if not, THEN return a code representing the facts.

Revision history for this message
Psi-Jack (erenfro) wrote :

Adam, dug more into the lsb spec, and found this: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html

So, in technicality, the script does as specified under that determination, making a pid file always outweight validity of any kind, so this is a debate I will need to take with the LSB group otherwise. As-is with this correction done, it fixes the main problem while staying completely in spec.

Revision history for this message
Adam Conrad (adconrad) wrote :

Verified that this does what it's meant to on precise.

tags: added: verification-done
removed: verification-needed
tags: added: verification-done-precise verification-needed
removed: verification-done
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lsb - 4.0-0ubuntu8.1

---------------
lsb (4.0-0ubuntu8.1) lucid-proposed; urgency=low

  * If a pidfile is specified, but doesn't provide a PID
    to test, return 'not running', and return 'unknown'
    if the pidfile exists but is unreadable (LP: #683640)
 -- Adam Stokes <email address hidden> Wed, 10 Oct 2012 14:26:06 -0400

Changed in lsb (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote :

Fixed in precise in version 4.0-0ubuntu20.2

Changed in lsb (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.