[SRU] WARNING: crmadmin -S <HOST> unexpected output
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack HA Cluster Charm |
Invalid
|
Undecided
|
Unassigned | ||
crmsh (Ubuntu) |
Fix Released
|
Undecided
|
Athos Ribeiro | ||
Jammy |
Fix Released
|
Undecided
|
Athos Ribeiro | ||
Kinetic |
Fix Released
|
Undecided
|
Athos Ribeiro | ||
Lunar |
Fix Released
|
Undecided
|
Athos Ribeiro |
Bug Description
[Impact]
when running a pacemaker cluster it is necessary to put nodes into maintenance mode to allow stopping services or other operational tasks, with the current version of crmsh shipped in the ubuntu archive this is not possible.
[Test Plan]
1) Deploy a cluster
Get the bundle available at https:/
juju deploy ./jammy-yoga.yaml
2) Once the setup is complete put a node into maintenance with the following command:
juju ssh keystone/leader
sudo crm -w -F node maintenance $(hostname)
Expected result:
the node is put into maintenance mode
Actual result:
$ sudo crm -w -F node maintenance $(hostname)
ERROR: running cibadmin -Ql: Could not connect to the CIB: Transport endpoint is not connected
Init failed, could not perform requested operations
WARNING: crmadmin -S juju-49cb70-jammy-4 unexpected output: Controller on juju-49cb70-jammy-4 in state S_IDLE: ok (exit code: 0)
[Where problems could occur]
The proposed patch relaxes the check of the output of the command "crmadmin -S" which is used in the implementation of the method wait4dc() which is used to block until the pacemaker goes into idle state after a modification, so problems with this patch would have the symptom of crmsh trying to perform operations on a non idle pacemaker engine.
[Other info]
The patch proposed is already merged in the crmsh-4.3 branch
https:/
[Original Description]
Pacemaker changed the output string of "crmadmin -S <HOST>" in 2.1.0 with the commit https:/
Example output of `crm -w -F node maintenance <HOST>` of a cluster running on Ubuntu 22.04
```
root@juju-
WARNING: crmadmin -S juju-0c8f53-
root@juju-
1
root@juju-
Controller on juju-0c8f53-
root@juju-
Desired=
| Status=
|/ Err?=(none)
||/ Name Version Architecture Description
+++-===
ii crmsh 4.3.1-1ubuntu2 all CRM shell for the pacemaker cluster manager
ii pacemaker 2.1.2-1ubuntu3 amd64 cluster resource manager
```
Upstream bug filed: https:/
Related branches
- git-ubuntu bot: Approve
- Bryce Harrington (community): Approve
- Canonical Server Reporter: Pending requested
-
Diff: 65 lines (+43/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/lp1972730.patch (+35/-0)
debian/patches/series (+1/-0)
- Canonical Server Reporter: Pending requested
-
Diff: 108 lines (+57/-0) (has conflicts)4 files modifieddebian/changelog (+9/-0)
debian/patches/lp1972730.patch (+35/-0)
debian/patches/series (+4/-0)
debian/tests/pacemaker-node-status.sh (+9/-0)
- Athos Ribeiro (community): Approve
- Canonical Server Reporter: Pending requested
-
Diff: 67 lines (+45/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/lp1972730.patch (+37/-0)
debian/patches/series (+1/-0)
Changed in crmsh (Ubuntu): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
Changed in crmsh (Ubuntu Jammy): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
Changed in crmsh (Ubuntu): | |
status: | Triaged → In Progress |
Changed in crmsh (Ubuntu Kinetic): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
Changed in crmsh (Ubuntu Kinetic): | |
status: | New → In Progress |
The way this error is exposed in the hacluster is when running the `stop` hook:
unit-hacluster-1: 16:53:41 DEBUG juju.worker. uniter. runner starting jujuc server {unix @/var/lib/ juju/agents/ unit-hacluster- 1/agent. socket <nil>} 1.juju- log Setting node juju-0c8f53- zaza-723eab2440 3d-4 to maintenance 1.stop Traceback (most recent call last): 1.stop File "/var/lib/ juju/agents/ unit-hacluster- 1/charm/ hooks/stop" , line 767, in <module> 1.stop hooks.execute( sys.argv) 1.stop File "/var/lib/ juju/agents/ unit-hacluster- 1/charm/ charmhelpers/ core/hookenv. py", line 962, in execute 1.stop self._hooks[ hook_name] () 1.stop File "/var/lib/ juju/agents/ unit-hacluster- 1/charm/ hooks/stop" , line 617, in stop 1.stop pcmk.set_ node_status_ to_maintenance( node) 1.stop File "/var/lib/ juju/agents/ unit-hacluster- 1/charm/ hooks/pcmk. py", line 201, in set_node_ status_ to_maintenance 1.stop commit('crm -w -F node maintenance {}'.format( node_name) , 1.stop File "/var/lib/ juju/agents/ unit-hacluster- 1/charm/ hooks/pcmk. py", line 90, in commit 1.stop return subprocess. check_output( cmd.split( ), stderr= subprocess. STDOUT) 1.stop File "/usr/lib/ python3. 10/subprocess. py", line 420, in check_output 1.stop return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, 1.stop File "/usr/lib/ python3. 10/subprocess. py", line 524, in run 1.stop raise CalledProcessEr ror(retcode, process.args, 1.stop subprocess. CalledProcessEr ror: Command '['crm', '-w', '-F', 'node', 'maintenance', 'juju-0c8f53- zaza-723eab2440 3d-4']' returned non-zero exit status 1. uniter. operation hook "stop" (via explicit, bespoke hook script) failed: exit status 1
unit-hacluster-1: 16:53:41 INFO unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 WARNING unit.hacluster/
unit-hacluster-1: 16:53:43 ERROR juju.worker.