volume attach failed if attach again to an pause to active VM

Bug #1242366 reported by Guangya Liu (Jay Lau)
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Thang Pham
Icehouse
Fix Released
Low
ChangBo Guo(gcb)

Bug Description

Steps are as following:
1) Create one VM
2) Attach volume to the VM
3) pause the VM
4) detach the volume
5) unpause the VM
6) re-attch the VM to same device, nova compute throw exception

2013-10-20 23:21:22.520 DEBUG amqp [-] Channel open from (pid=19728) _open_ok /usr/local/lib/python2.7/dist-packages/amqp-1.0.12-py2.7.egg/amqp/channel.py:420
2013-10-20 23:21:22.520 ERROR nova.openstack.common.rpc.amqp [req-5f0d786e-1273-4611-b0a5-a787754c6bc8 admin admin] Exception during message handling
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp **args)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 90, in wrapped
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp payload)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 73, in wrapped
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 244, in decorated_function
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp pass
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 230, in decorated_function
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 272, in decorated_function
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 259, in decorated_function
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 3649, in attach_volume
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp context, instance, mountpoint)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 3644, in attach_volume
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp mountpoint, instance)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 3690, in _attach_volume
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp connector)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 3680, in _attach_volume
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp encryption=encryption)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1107, in attach_volume
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp raise exception.DeviceIsBusy(device=disk_dev)
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp DeviceIsBusy: The supplied device (vdb) is busy.
2013-10-20 23:21:22.520 TRACE nova.openstack.common.rpc.amqp
^C2013-10-20 23:21:24.871 INFO nova.openstack.common.service [-] Caught SIGINT, exiting

summary: - volume attach failed if attach again to an pause to active VM whic
+ volume attach failed if attach again to an pause to active VM
Changed in nova:
assignee: nobody → Jay Lau (jay-lau-513)
melanie witt (melwitt)
tags: added: volumes
Changed in nova:
importance: Undecided → Low
Ding Xiao (xiaoding1)
Changed in nova:
assignee: Jay Lau (jay-lau-513) → Ding Xiao (xiaoding1)
assignee: Ding Xiao (xiaoding1) → nobody
Revision history for this message
Thang Pham (thang-pham) wrote :

I do not believe this bug can be easily fixed.

The reason you are getting the DeviceIsBusy exception is because Linux sysfs did not receive the hotplug event (uevent) when the volume/device was detached. Since the instance was paused, no uevents could be created and so, Linux still believes the device is still there. When cinder attempts to hotplug the volume, the device name that nova chooses is found using by a database lookup of which devices are attached to the instance and selecting the next available device name to use. The device nova picks is /dev/vdb, the same one Linux sysfs believes is still attached to it. Therefore, device hotplug fails.

A possible workaround is to reboot Linux in order to force a sysfs refresh and free up the device name.

A possible solution is to prevent users from trying to hotplug/unplug to a paused/suspended instance, since it would corrupt sysfs.

Any thoughts?

Changed in nova:
assignee: nobody → Thang Pham (thang-pham)
Revision history for this message
Yogev Rabl (yrabl) wrote :

I have the same problem in a different scenario. It is described here: https://bugs.launchpad.net/nova/+bug/1313760

Changed in nova:
status: New → Confirmed
tags: added: libvirt
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Looking at this a bit more - I agree that we should disable it. What libvirt driiver currently does is issue attachDeviceFlags but without the VIR_DOMAIN_AFFECT_LIVE flag meaning that only the persistant domain configuration will be affected.

I still need to try this but likely attaching the volume and then rebooting the instance will make it visible to the instance, however that completely defeats the purpose of pausing. so it should likely be disabled.

Changed in nova:
status: Confirmed → Triaged
Revision history for this message
Thang Pham (thang-pham) wrote :

I experimented with using the VIR_DOMAIN_AFFECT_LIVE when attaching and detaching a volume from a paused instance. It turns out that it works!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/93190

Changed in nova:
status: Triaged → In Progress
Revision history for this message
QiangGuan (hzguanqiang) wrote :

Please handle for the situation in bug #1330800, thanks~

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/93190
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ead9cfca93c9b5ca55e3ba269213c98d5e6e1d38
Submitter: Jenkins
Branch: master

commit ead9cfca93c9b5ca55e3ba269213c98d5e6e1d38
Author: Thang Pham <email address hidden>
Date: Sat May 10 01:41:31 2014 -0400

    libvirt: Use VIR_DOMAIN_AFFECT_LIVE for paused instances

    When a volume is attached to a paused instance, it does not
    appear in the instance's block devices list (i.e. lsblk)
    after the instance has resumed. A similar situation
    happens when detaching a volume from a paused instance; the
    block device continues to exists in the block devices list,
    where it should not. It was found that the volume is only
    persisted in the domain's config settings and not its active
    settings, since only the VIR_DOMAIN_AFFECT_CONFIG flag was used
    on attach and detach volume. In order to affect the active
    settings, the VIR_DOMAIN_AFFECT_LIVE flag has to be added when
    attaching and detaching volumes.

    Change-Id: I9c90a410a7ecb91f5a4de28acee21fe7da49242c
    Closes-Bug: #1242366
    Related-Bug: #1299331

Changed in nova:
status: In Progress → Fix Committed
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/102456

Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/102456
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=520aa4c0829da9ba4b2beb8f90995f4bdec9b1f2
Submitter: Jenkins
Branch: stable/icehouse

commit 520aa4c0829da9ba4b2beb8f90995f4bdec9b1f2
Author: Thang Pham <email address hidden>
Date: Sat May 10 01:41:31 2014 -0400

    libvirt: Use VIR_DOMAIN_AFFECT_LIVE for paused instances

    When a volume is attached to a paused instance, it does not
    appear in the instance's block devices list (i.e. lsblk)
    after the instance has resumed. A similar situation
    happens when detaching a volume from a paused instance; the
    block device continues to exists in the block devices list,
    where it should not. It was found that the volume is only
    persisted in the domain's config settings and not its active
    settings, since only the VIR_DOMAIN_AFFECT_CONFIG flag was used
    on attach and detach volume. In order to affect the active
    settings, the VIR_DOMAIN_AFFECT_LIVE flag has to be added when
    attaching and detaching volumes.

    This change adjusts little about the unit test:
    add two imports.

    Change-Id: I9c90a410a7ecb91f5a4de28acee21fe7da49242c
    Closes-Bug: #1242366
    Related-Bug: #1299331
    (cherry picked from commit ead9cfca93c9b5ca55e3ba269213c98d5e6e1d38)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.