Spurious Pacemaker errors: couldn't create file for mmap
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libqb (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Undecided
|
Lena Voytek |
Bug Description
[Impact]
While running Pacemaker, a system will occasionally run into the error:
error: couldn't create file for mmap
which can result in a crash.
This has been fixed upstream, in Kinetic, and in other distributions by retrying the posix_fallocate command if it fails up to five times.
Adding this fix will bring the same retry code to Jammy, significantly lowering the rate of crashes on a system running Pacemaker.
[Test Plan]
Although the error is inconsistent, testing can be done by running Pacemaker over long periods of time.
Without the fix, the error is likely to show up multiple times a day. So to test the fix, use a system or cluster experiencing the error at around this rate, update the package with the fix, and run Pacemaker for at least a day, confirming there are no mmap errors/crashes.
[Where problems could occur]
Since posix_fallocate is retried up to 5 times, this fix may cover up underlying errors due to race conditions, which could lead to ambiguous issues in the future.
[Other Info]
Upstream fix located here: https:/
[Original Description]
We recently built a new cluster based on Pacemaker 2.1 / Corosync 3 on Ubuntu 22.04 LTS.
It mostly works fine, but we frequently (multiple times per day) experience spurious restarts or failures of a service on a single node.
Syslog on the affected node reports something like this leading up to the failure:
Jun 16 14:35:06 pgdb5 pacemaker-
Jun 16 14:35:06 pgdb5 pacemaker-
Jun 16 14:35:06 pgdb5 pacemaker-
Jun 16 14:35:06 pgdb5 pacemaker-
Jun 16 14:35:06 pgdb5 pacemaker-
Our symptoms are very similar to this SUSE bug, which was fixed upstream recently: https:/
Related branches
- git-ubuntu bot: Approve
- Andreas Hasenack: Approve
- Canonical Server Reporter: Pending requested
- Canonical Server: Pending requested
-
Diff: 82 lines (+60/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/retry-if-posix-fallocate-interrupted-eintr.patch (+52/-0)
debian/patches/series (+1/-0)
Hello René, thank you for submitting this report. Based on the info you provided it seems like the upstream fix you provided would fix the issue. We've added the fix to our upcoming release but it is not currently in 22.04. As such, I created a PPA with the fix for Jammy, located at:
https:/ /launchpad. net/~lvoytek/ +archive/ ubuntu/ libqb-retry- posix-fallocate -jammy
The code for the fix is here:
https:/ /git.launchpad. net/~lvoytek/ ubuntu/ +source/ libqb/commit/ ?id=cc8029c9ad0 865bed61df1741a d8ad156fc7afac
If you would like to try it on your system to make sure it works you can run the following commands:
sudo add-apt-repository ppa:lvoytek/ libqb-retry- posix-fallocate -jammy
sudo apt update
sudo apt upgrade