hit issue with script rescan-scsi-bus.sh in VM with image ubuntu 22.04 live server running in ESXi

Bug #1966860 reported by vmware-gos-Yuhua
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
sg3-utils (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

Problem:

After executing "rescan-scsi-bus.sh -a -r" on Ubuntu 22.04 VM installed with image OS/Linux/Ubuntu/22.04/20220326/jammy-live-server-amd64.iso in ESXi, all non loop devices were removed, and system crashed.

sg3-utils version is 1.46-1.

root@ubuntu:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 43.6M 1 loop /snap/snapd/14978
loop1 7:1 0 78.1M 1 loop /snap/lxd/22525
loop2 7:2 0 61.9M 1 loop /snap/core20/1361
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 1.5G 0 part /boot
└─sda3 8:3 0 38.5G 0 part
  └─ubuntu--vg-ubuntu--lv 253:0 0 19.2G 0 lvm /
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 1024M 0 rom

root@ubuntu:~# dpkg -l | grep sg3-utils
ii sg3-utils 1.46-1 amd64 utilities for devices using the SCSI command set
ii sg3-utils-udev 1.46-1 all utilities for devices using the SCSI command set (udev rules)

root@ubuntu:~# rescan-scsi-bus.sh -a -r
Syncing file systems
Scanning SCSI subsystem for new devices and remove devices that have disappeared
Scanning host 0 for SCSI target IDs 0
1
2
3
4
5
6
.....

two device(s) removed.
...

root@ubuntu:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 43.6M 1 loop /snap/snapd/14978
loop1 7:1 0 78.1M 1 loop /snap/lxd/22525
loop2 7:2 0 61.9M 1 loop /snap/core20/1361

Steps:
 1) create VM with LSI Logic Parallel Disk and install guest OS jammy-desktop-amd64.iso daily build 26/Mar/2022 in ESXi
 2) finish installation and reboot VM
 3) exec command "lsblk" in VM:
    root@ubuntu:~# lsblk
      NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
      loop0 7:0 0 43.6M 1 loop /snap/snapd/14978
      loop1 7:1 0 78.1M 1 loop /snap/lxd/22525
      loop2 7:2 0 61.9M 1 loop /snap/core20/1361
      sda 8:0 0 40G 0 disk
      ├─sda1 8:1 0 1M 0 part
      ├─sda2 8:2 0 1.5G 0 part /boot
      └─sda3 8:3 0 38.5G 0 part
       └─ubuntu--vg-ubuntu--lv 253:0 0 19.2G 0 lvm /
      sr0 11:0 1 1024M 0 rom
      sr1 11:1 1 1024M 0 rom
 4) execute command "rescan-scsi-bus.sh -a -r" in VM and then "lsblk":

     root@ubuntu:~# lsblk
        NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
        loop0 7:0 0 43.6M 1 loop /snap/snapd/14978
        loop1 7:1 0 78.1M 1 loop /snap/lxd/22525
        loop2 7:2 0 61.9M 1 loop /snap/core20/1361

Issue: For step 4, we find the non loop disks are removed including the boot disk sda.

vmware-gos-Yuhua (yhzou)
summary: - hit issue with script rescan-scsi-bus.sh in ubuntu 22.04 live server
+ hit issue with script rescan-scsi-bus.sh in VM with image ubuntu 22.04
+ live server running in ESXi
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1966860/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
vmware-gos-Yuhua (yhzou)
affects: ubuntu → sg3-utils (Ubuntu)
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Thanks for the detailed steps on how to reproduce the issue. I know you faced this issue using vmware technologies but since I do not have a setup in hands to test it, I gave it a try to reproduce it in my local setup with qemu/libvirt and I was not able to get the same results:

$ dpkg -l sg3-utils
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-============-============-================================================
ii sg3-utils 1.46-1build1 amd64 utilities for devices using the SCSI command set
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 76.2M 1 loop /snap/lxd/22340
loop1 7:1 0 61.9M 1 loop /snap/core20/1328
loop2 7:2 0 43.4M 1 loop /snap/snapd/14549
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 9.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 366K 0 disk
vdc 252:32 0 1G 0 disk
$ sudo rescan-scsi-bus.sh -a -r
Syncing file systems
Scanning SCSI subsystem for new devices and remove devices that have disappeared
Scanning host 0 for SCSI target IDs 0
1
2
3
4
5
6
7, all LUNs
Scanning host 1 for SCSI target IDs 0
1
2
3
4
5
6
7, all LUNs
Scanning host 2 for SCSI target IDs 0
1
2
3
4
5
6
7, all LUNs
Scanning host 3 for SCSI target IDs 0
1
2
3
4
5
6
7, all LUNs
Scanning host 4 for SCSI target IDs 0
1
2
3
4
5
6
7, all LUNs
Scanning host 5 for SCSI target IDs 0
1
2
3
4
5
6
7, all LUNs
0 new or changed device(s) found.
0 remapped or resized device(s) found.
0 device(s) removed.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 76.2M 1 loop /snap/lxd/22340
loop1 7:1 0 61.9M 1 loop /snap/core20/1328
loop2 7:2 0 43.4M 1 loop /snap/snapd/14549
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 9.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 366K 0 disk
vdc 252:32 0 1G 0 disk

I'll check if I can have access to a setup like yours to properly test it.

Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

Hi lucaskanashiro,
Thanks very much for your effort.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

While Lucas works on obtaining access to a similar setup, do you think you can provide us with any useful log? For example, you mentioned that the system crashed; is there any interesting log related to the crash? Thanks in advance.

Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :
Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :
Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :
Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

I execute the command "rescan-scsi-bus.sh -a -r > command_output_1.txt 2>&1" in terminal and get some files attached:

1) command_output.png : error message output when exec command in terminal
2) dmesg.0: get the file /var/log/dmesg.0
3) command_output_1.txt

Maybe it would be a little helpful.

Note:
after execute the command "rescan-scsi-bus.sh -a -r", the boot disk is removed.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for the extra information.

Looking at the PNG file you've attached, I see a bunch of I/O error messages being displayed while trying to access the device. I'm not an expert here, but it looks to me like there may be a hardware problem involved? Why are you seeing so many I/O errors? Are they eventually causing the device to be removed (probably because sg3-utils cannot access it)?

Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

> it looks to me like there may be a hardware problem involved?
I suspect it's issue of script file rescan-scsi-bus.sh.
VMware have developed own file rescan-scsi-bus.sh and this script file works well with these steps.

> Why are you seeing so many I/O errors?
I think it's because the boot disk /dev/sda is removed by command "rescan-scsi-bus.sh -a -r" unexpectedly. So we will see many I/O errors when access the file in /dev/sda.

Revision history for this message
Paride Legovini (paride) wrote :

Hi, to debug this I think we need to manually follow what rescan-scsi-bus.sh does and find out why the script thinks /dev/sda disappeared after a rescan. This is not easy to do without access to an affected system. You could try running the script in xtrace mode:

  bash -x /usr/bin/rescan-scsi-bus.sh ...

but by looking at the script I expect the output to be very verbose and difficult to decipher. Still it may be worth trying, please attach the output to this bug report if you try. Thanks.

Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

Hi,
I attached the terminal output for the following command:
1)lsblk
2)bash -x /usr/bin/rescan-scsi-bus.sh -a -r 2>&1
3)lsblk

Please check attached file rescan_output. Thanks

Revision history for this message
Paride Legovini (paride) wrote :

There's something off: stderr is missing. Is that that actual terminal output (copy/pasted), or did you use redirects to log the command outputs and then manually reconstructed the interactive session log by adding the lines with the commands? If that's the case, and

  root@ubuntu~#bash -x /usr/bin/rescan-scsi-bus.sh -a -r 2>&1

was actually:

  root@ubuntu:~# bash -x /usr/bin/rescan-scsi-bus.sh -a -r 2>&1 >logfile

note that 2>&1 should follow the stdout redirect (>logfile) if you want to log both stdout and stderr, e.g.

  bash -x /usr/bin/rescan-scsi-bus.sh -a -r >logfile 2>&1

Changed in sg3-utils (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

In order to capture the terminal output, I add a virtual device "Serial Port" to the VM located in ESXi and then redirect the terminal output to /dev/ttyS0.
So in #13 I actually take the command:
  bash -x /usr/bin/rescan-scsi-bus.sh -a -r 2>&1 >/dev/ttyS0

Why I can't take "bash -x /usr/bin/rescan-scsi-bus.sh -a -r 2>&1 >logfile" ?
After running the command "/usr/bin/rescan-scsi-bus.sh", the boot disk /dev/sda is deleted. The VM won't work normally. So I can't take command such as "scp .." to copy the logfile from VM.

you can easily to reproduce the issue if you can setup the environment.

Thanks
Yuhua Zou

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for sg3-utils (Ubuntu) because there has been no activity for 60 days.]

Changed in sg3-utils (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Bryce Harrington (bryce) wrote :

[Unexpiring and marking for server team discussion]

Changed in sg3-utils (Ubuntu):
status: Expired → New
tags: added: server-triage-discuss
removed: bot-comment
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

FWIW I did not manage to get a VM in ESXi to try to reproduce this bug.

Revision history for this message
Robie Basak (racb) wrote :

We discussed this, but since we can't reproduce it we've added it to our backlog. If someone could work out a root cause, we'd be happy to accept a contribution and help get a fix landed!

tags: removed: server-triage-discuss
Changed in sg3-utils (Ubuntu):
status: New → Triaged
Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

I can reproduce this issue in Fusion. You would create a VM in Fusion and reproduce the issue.

Reproduce Steps
 1) create VM with EFI, SCSI disk controller and other default settings.
 2) mount ubuntu-22.10-live-server-amd64.iso to VM and follow the wizard to nstall guest OS
 3) finish installation and reboot VM
 4) exec command "lsblk" in VM:
    root@ubuntu:~# lsblk
      NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
      loop0 7:0 0 63.2M 1 loop /snap/core20/1623
      loop1 7:1 0 135.M 1 loop /snap/lxd/23680
      loop2 7:2 0 61.9M 1 loop /snap/snapd/17336
      sda 8:0 0 20G 0 disk
      ├─sda1 8:1 0 935M 0 part /boot/efi
      ├─sda2 8:2 0 1.8G 0 part /boot
      └─sda3 8:3 0 17.53 0 part
       └─ubuntu--vg-ubuntu--lv 253:0 0 10G 0 lvm /
      sr0 11:0 1 1024M 0 rom

 5) execute command "rescan-scsi-bus.sh -a -r" in VM and then "lsblk":

     root@ubuntu:~# lsblk
        NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
        loop0 7:0 0 63.2M 1 loop /snap/core20/1623
        loop1 7:1 0 135.M 1 loop /snap/lxd/23680
        loop2 7:2 0 61.9M 1 loop /snap/snapd/17336

For step 5, I hit the issue "the boot disk sda is removed."

Env:
sg3-utils version is 1.46-1build1
Fusion version is Fusion Player 12.1.1 (This issue should be not related to Fusion version)

Best regards
Yuhua Zou

Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

Revised #20 to add more detailed info.

When try in Fusion, I reproduce this issue.
Env:
Fusion version is Fusion Player 12.1.1
Guest OS image: ubuntu-22.10-live-server-amd64.iso
sg3-utils version in VM: 1.46-1build1

Fusion/Workstation is VMware desktop hypervisor . You can install the Fusion in the Mac or Workstation in your Windows/Linux machine.

This issue should be not related to Fusion/Workstation version and it also would be reproduced in Workstation.

Download the Fusion from
https://www.vmware.com/products/fusion/fusion-evaluation.html

Download Workstation from
https://www.vmware.com/cn/products/workstation-pro/workstation-pro-evaluation.html

Reproduce Steps
 1) create VM with EFI, SCSI disk controller and other default settings.
 2) mount ubuntu-22.04-live-server-amd64.iso / ubuntu-22.10-live-server-amd64.iso to VM and follow the wizard to install guest OS
 3) finish installation and reboot VM
 4) exec command "lsblk" in VM:
     root@ubuntu:~# lsblk
      NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
      loop0 7:0 0 63.2M 1 loop /snap/core20/1623
      loop1 7:1 0 135.M 1 loop /snap/lxd/23680
      loop2 7:2 0 61.9M 1 loop /snap/snapd/17336
      sda 8:0 0 20G 0 disk
      ├─sda1 8:1 0 935M 0 part /boot/efi
      ├─sda2 8:2 0 1.8G 0 part /boot
      └─sda3 8:3 0 17.53 0 part
       └─ubuntu--vg-ubuntu--lv 253:0 0 10G 0 lvm /
      sr0 11:0 1 1024M 0 rom
 5) execute command "rescan-scsi-bus.sh -a -r" in VM and then "lsblk":
      root@ubuntu:~# lsblk
        NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
        loop0 7:0 0 63.2M 1 loop /snap/core20/1623
        loop1 7:1 0 135.M 1 loop /snap/lxd/23680
        loop2 7:2 0 61.9M 1 loop /snap/snapd/17336

Issue:
For step 5, the boot disk sda is removed with command "rescan-scsi-bus.sh -a -r" in VM
It's expected the boot disk sda won't be removed because it's present in VM settings.

The command will remove the disk which is not present in VM settings (such as: edit VM settings to hot-remove disk)

Revision history for this message
André Burkhardt (pseudoab80) wrote :

I can confirm the problem.
1) Ubuntu 22.04.1 LTS VM on VMware ESX vSphere
2) lsblk
   NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
   sda 8:0 0 15G 0 disk
   ├─sda1 8:1 0 1M 0 part
   ├─sda2 8:2 0 512M 0 part /boot
   └─sda3 8:3 0 14.5G 0 part
     ├─vg_system-lv_root 253:0 0 6G 0 lvm /
     ├─vg_system-lv_var 253:1 0 5G 0 lvm /var
     ├─vg_system-lv_swap 253:2 0 2G 0 lvm [SWAP]
     └─vg_system-lv_pgdata 253:3 0 1.5G 0 lvm /var/lib/postgresql
   sr0 11:0 1 1024M 0 rom
3) rescan-scsi-bus.sh -r
   Syncing file systems
   Scanning SCSI subsystem for new devices and remove devices that have disappeared
   Scanning host 0 for SCSI target IDs 0
   1
   2
   3
   4
   5
   6
   7, all LUNs
   Scanning host 1 for SCSI target IDs 0
   1
   2
   3
   4
   5
   6
   7, all LUNs
   [...]
    Scanning for device 2 0 0 0 ..sg0 changed: LU not available (PQual 3)
   REM: Host: scsi2 Channel: 00 Id: 00 Lun: 00
   DEL: Vendor: NECVMWar Model: VMware SATA CD00 Rev: 1.00
         Type: CD-ROM ANSI SCSI revision: 05
   [...]
   Scanning host 32 for SCSI target IDs 0
   1
   2
   3
   4
   5
   6
   7, all LUNs
   sg1 changed: LU not available (PQual 3)
   REM: Host: scsi32 Channel: 00 Id: 00 Lun: 00
   DEL: Vendor: VMware Model: Virtual disk Rev: 2.0
         Type: Direct-Access ANSI SCSI revision: 06
   [...]
   0 new or changed device(s) found.
   0 remapped or resized device(s) found.
   2 device(s) removed.
   \t[2:0:0:0]\n\t[32:0:0:0]\n
4) lsblk
   [no output] => All Disks are removed, System is Read-Only

The bug sounds similar to this bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1976228

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

There is this PR which might be related to this issue:

https://github.com/doug-gilbert/sg3_utils/pull/10

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

I prepared a PPA here applying the patch in comment #23, could any of you give it a try and see if it fixes the issue?

https://launchpad.net/~lucaskanashiro/+archive/ubuntu/rescan-scsi-bus-fix/

Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

>I prepared a PPA here applying the patch in comment #23, could any of you give it a try and see if it fixes the issue?
> https://launchpad.net/~lucaskanashiro/+archive/ubuntu/rescan-scsi-bus-fix/

It fixes the issue with this PPA. Thanks very much.

Best regards
Yuhua Zou

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

This seems a dup of this other bug:

https://bugs.launchpad.net/ubuntu/+source/sg3-utils/+bug/1976228

Can you confirm that @yhzou? Is there anything not covered in the other bug? If no I'll mark this as a duplicate.

FWIW the package you tested here is the same that I provided in the other bug.

Revision history for this message
vmware-gos-Yuhua (yhzou) wrote :

> This seems a dup of this other bug:

> https://bugs.launchpad.net/ubuntu/+source/sg3-utils/+bug/1976228

> Can you confirm that @yhzou? Is there anything not covered in the other bug? If no I'll mark this as a duplicate.

No. please mark this as a duplicate. Thanks.

Best regards
Yuhua Zou

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.