faulty paths are not removed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
multipath-tools (Ubuntu) |
Expired
|
High
|
Unassigned |
Bug Description
The setup is - FC connected dual controller storage via single HBA on a Lenovo Lenovo Flex System x240 M5 Compute Node
When a volume is unmapped by the storage the paths in multipath map are not removed. dev_loss_tmo is set and the correct value is updated in the rport sysfs. However paths stay:
360050763808081
size=6.0G features='0' hwhandler='0' wp=rw
`-+- policy=
|- 1:0:0:5 sdn 8:208 failed faulty running
`- 1:0:1:5 sdo 8:224 failed faulty running
Even when the map is flushed the paths are not removed by udev
The serious issue is that if another volume is mapped to the host with the same LUN (the storage by default chooses the lowest unused LUN) the paths are not updated by udev and are presented with wrong WWID (the old one). This leads to serious data corruption as both volumes may be presented as one multipath device.
In the man page multipath.conf(5)it says:
So this is not expected behaviour. The path are not checked at all if the WWID has changed and the udev info shows the old device properties (not updated upon path reinstated)
Flushing the map does not remove the path devices from the system also. They are left orphaned and upon reload of the maps are readded, even if both path are failing.
1) The release of Ubuntu you are using, via 'lsb_release -rd' or System -> About Ubuntu
Description: Ubuntu 20.04.1 LTS
Release: 20.04
2) The version of the package you are using, via 'apt-cache policy pkgname' or by checking in Software Center
apt-cache policy multipath-tools
multipath-tools:
Installed: 0.8.3-1ubuntu2
Candidate: 0.8.3-1ubuntu2
Version table:
*** 0.8.3-1ubuntu2 500
500 http://
100 /var/lib/
sg3-utils:
Installed: 1.44-1ubuntu2
Candidate: 1.44-1ubuntu2
Version table:
*** 1.44-1ubuntu2 500
500 http://
100 /var/lib/
sg3-utils-udev:
Installed: 1.44-1ubuntu2
Candidate: 1.44-1ubuntu2
Version table:
*** 1.44-1ubuntu2 500
500 http://
500 http://
100 /var/lib/
3) What you expected to happen
setting dev_loss_tmo to a certain value to be respected and paths to be removed if failed
4) What happened instead
paths and map stay in "running" state and path are reused without wwid check
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: multipath-tools 0.8.3-1ubuntu2
ProcVersionSign
Uname: Linux 5.4.0-62-generic x86_64
ApportVersion: 2.20.11-
Architecture: amd64
CasperMD5CheckR
Date: Fri Jan 15 20:25:33 2021
InstallationDate: Installed on 2020-05-27 (233 days ago)
InstallationMedia: Ubuntu-Server 18.04.4 LTS "Bionic Beaver" - Release amd64 (20200203.1)
ProcEnviron:
SHELL=/bin/bash
LANGUAGE=en_US:en
LANG=en_US.UTF-8
TERM=xterm-
PATH=(custom, no user)
SourcePackage: multipath-tools
UpgradeStatus: Upgraded to focal on 2021-01-14 (0 days ago)
mtime.conffile.
Changed in multipath-tools (Ubuntu): | |
importance: | Undecided → High |
Trying to recreate on 20.04
# enable my FC adapters
$ sudo chccwdev -e 0.0.e000
$ sudo chccwdev -e 0.0.e100
# Ensure and check I have a one minute set (default would be infinite) css0/0. 0.*/0.0. */host* /rport- */fc_remote_ ports/rport- */*loss_ tmo; do b=$(basename $f); echo "$b : $(cat $f)"; done
$ for f in /sys/devices/
dev_loss_tmo : 60
dev_loss_tmo : 60
dev_loss_tmo : 60
dev_loss_tmo : 60
An individual device right now looks like this: 6b6000000000000 240a) dm-3 IBM,2107900 'service- time 0' prio=50 status=active
mpathb (36005076306ffd
size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy=
|- 0:0:0:1074413604 sdc 8:32 active ready running
|- 0:0:1:1074413604 sdh 8:112 active ready running
|- 1:0:1:1074413604 sdr 65:16 active ready running
`- 1:0:0:1074413604 sdm 8:192 active ready running
Then I was unmapping that on the storage server makes this
Even not "using" the disks actively I immediately see the errors on them in dmesg.
[ 4438.196385] device-mapper: multipath: Failing path 8:32. DRIVER_ SENSE DRIVER_ SENSE DRIVER_ SENSE
[ 4438.205404] sd 0:0:1:1074413604: [sdh] tag#2379 FAILED Result: hostbyte=DID_OK driverbyte=
[ 4438.205407] sd 0:0:1:1074413604: [sdh] tag#2379 Sense Key : Aborted Command [current]
[ 4438.205410] sd 0:0:1:1074413604: [sdh] tag#2379 Add. Sense: Logical unit not supported
[ 4438.205413] sd 0:0:1:1074413604: [sdh] tag#2379 CDB: Read(10) 28 00 01 3f ff 80 00 00 08 00
[ 4438.205416] blk_update_request: I/O error, dev sdh, sector 20971392 op 0x0:(READ) flags 0x84700 phys_seg 1 prio class 0
[ 4438.205428] device-mapper: multipath: Failing path 8:112.
[ 4438.205595] sd 1:0:1:1074413604: [sdr] tag#2933 FAILED Result: hostbyte=DID_OK driverbyte=
[ 4438.205598] sd 1:0:1:1074413604: [sdr] tag#2933 Sense Key : Aborted Command [current]
[ 4438.205605] sd 1:0:1:1074413604: [sdr] tag#2933 Add. Sense: Logical unit not supported
[ 4438.205609] sd 1:0:1:1074413604: [sdr] tag#2933 CDB: Read(10) 28 00 01 3f ff 80 00 00 08 00
[ 4438.205611] blk_update_request: I/O error, dev sdr, sector 20971392 op 0x0:(READ) flags 0x84700 phys_seg 1 prio class 0
[ 4438.205617] device-mapper: multipath: Failing path 65:16.
[ 4438.205772] sd 1:0:0:1074413604: [sdm] tag#2934 FAILED Result: hostbyte=DID_OK driverbyte=
[ 4438.205775] sd 1:0:0:1074413604: [sdm] tag#2934 Sense Key : Aborted Command [current]
[ 4438.205777] sd 1:0:0:1074413604: [sdm] tag#2934 Add. Sense: Logical unit not supported
[ 4438.205779] sd 1:0:0:1074413604: [sdm] tag#2934 CDB: Read(10) 28 00 01 3f ff 80 00 00 08 00
[ 4438.205781] blk_update_request: I/O error, dev sdm, sector 20971392 op 0x0:(READ) flags 0x84700 phys_seg 1 prio class 0
[ 4438.205788] device-mapper: multipath: Failing path 8:192.
And multipath immediately switched them to faulty state
mpathb (36005076306ffd 6b6000000000000 240a) dm-3 IBM,2107900 'service- time 0' prio=0 status=enabled
size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy=
|- 0:0:0:1074413604 sdc 8:32 failed faulty running
|- 0:0:1:1074413604 sdh 8:112 failed faulty running
|- 1:0:1:1074413604 sdr 65:16 failed faulty running
`- 1:0:0:1074413604 sdm 8:192 failed faulty runni...