Live migration fails. XML error: CPU feature `wdt' specified more than once

Bug #1303536 reported by Maximiliano Osorio
134
This bug affects 27 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Vladik Romanovsky
Icehouse
Fix Released
Medium
Vladik Romanovsky

Bug Description

Description of problem
---------------------------

Live migration fails.
libvirt says "XML error: CPU feature `wdt' specified more than once"

Version
---------

ii libvirt-bin 1.2.2-0ubuntu2 amd64 programs for the libvirt library
ii python-libvirt 1.2.2-0ubuntu1 amd64 libvirt Python bindings
ii nova-compute 1:2014.1~b3-0ubuntu2 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2014.1~b3-0ubuntu2 all OpenStack Compute - compute node (KVM)
ii nova-cert 1:2014.1~b3-0ubuntu2 all OpenStack Compute - certificate management

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu Trusty Tahr (development branch)"
NAME="Ubuntu"
VERSION="14.04, Trusty Tahr"

Test env
----------

A two node openstack havana on ubuntu 14.04. Migrating a instance to other node.

Steps to Reproduce
------------------
 - Migrate the instance

And observe /var/log/nova/compute.log and /var/log/libvirt.log

Actual results
--------------

/var/log/nova-conductor.log

2014-04-04 13:42:17.128 3294 ERROR oslo.messaging._drivers.common [-] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply\n incoming.message))\n', ' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch\n result = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/server.py", line 139, in inner\n return func(*args, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 668, in migrate_server\n block_migration, disk_over_commit)\n', ' File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 769, in _live_migrate\n raise exception.MigrationError(reason=ex)\n', 'MigrationError: Migration error: Remote error: libvirtError XML error: CPU feature `wdt\' specified more than once\n[u\'Traceback (most recent call last):\\n\', u\' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply\\n incoming.message))\\n\', u\' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch\\n return self._do_dispatch(endpoint, method, ctxt, args)\\n\', u\' File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch\\n result = getattr(endpoint, method)(ctxt, **new_args)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 88, in wrapped\\n payload)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 68, in __exit__\\n six.reraise(self.type_, self.value, self.tb)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 71, in wrapped\\n return f(self, context, *args, **kw)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 272, in decorated_function\\n e, sys.exc_info())\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 68, in __exit__\\n six.reraise(self.type_, self.value, self.tb)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 259, in decorated_function\\n return function(self, context, *args, **kwargs)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4159, in check_can_live_migrate_destination\\n block_migration, disk_over_commit)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4094, in check_can_live_migrate_destination\\n self._compare_cpu(source_cpu_info)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4236, in _compare_cpu\\n LOG.error(m, {\\\'ret\\\': ret, \\\'u\\\': u})\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 68, in __exit__\\n six.reraise(self.type_, self.value, self.tb)\\n\', u\' File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4232, in _compare_cpu\\n ret = self._conn.compareCPU(cpu.to_xml(), 0)\\n\', u\' File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit\\n result = proxy_call(self._autowrap, f, *args, **kwargs)\\n\', u\' File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call\\n rv = execute(f,*args,**kwargs)\\n\', u\' File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker\\n rv = meth(*args,**kwargs)\\n\', u\' File "/usr/lib/python2.7/dist-packages/libvirt.py", line 3191, in compareCPU\\n if ret == -1: raise libvirtError (\\\'virConnectCompareCPU() failed\\\', conn=self)\\n\', u"libvirtError: XML error: CPU feature `wdt\' specified more than once\\n"].\n']
2014-04-04 13:52:18.161 3295 ERROR nova.conductor.manager [req-471d2933-354a-4417-af50-c48399e19663 42fab7a8b7434bfc8473767c01e8378d b1cf6337c229491c96ad6e0a96e82979] Migration of instance 47d1fe7d-b812-4588-85eb-aa813267fc82 to host c2 unexpectedly failed.

/var/log/libvirtd.log

2014-03-27 18:23:17.141+0000: 2659: info : libvirt version: 1.2.2
2014-03-27 18:23:17.141+0000: 2659: error : virCPUDefParseXML:413 : XML error: CPU feature `wdt' specified more than once

Expected results
----------------
Successful migration

Additional info
----------------

Related with: https://bugs.launchpad.net/nova/+bug/1267191

On the file /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py, the list info['features'] have the duplicate feature.

Revision history for this message
Maximiliano Osorio (mosorio-0) wrote :
Solly Ross (sross-7)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Solly Ross (sross-7) wrote :

I've hit a similar issue before. We should coordinate with the libvirt team to try to get this solved on their end, but we should also have a check on our end for older libvirt versions.

Changed in nova:
status: New → Confirmed
tags: added: low-hanging-fruit
Revision history for this message
Vladik Romanovsky (vladik-romanovsky) wrote :

I think this problem should be resolved if we will backport the following patch to Havana:
https://review.openstack.org/#/c/65360/

Revision history for this message
Sachi King (nakato) wrote :

I'm not sure that this is a bug with libvirt.

This bug appears to come from `get_host_capabilities` in nova/virt/libvirt/driver.py

We `getCapabilities` from the libvirt connection, then parse this into our object.
Next we call libvirt's `baselineCPU` with a XML representation of our CPU.
This returns an XML string that includes all the CPU features, including the ones we passed to it. This, to me, sounds right according to libvirt documentation.

From: http://libvirt.org/html/libvirt-libvirt.html#virConnectBaselineCPU
"Computes the most feature-rich CPU which is compatible with all given host CPUs.
If @flags includes VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES then libvirt will explicitly list all CPU features that are part of the host CPU, without this flag features that are part of the CPU model will not be listed."

We then parse all the features that are returned back into our object creating an XML document with duplicates.

Then, we later use this XML in the `_compare_cpu` function in driver.py, in which libvirt returns an error because it has received the XML from Nova with a duplicate.

In contrast to the related bug, it appears that it was caused by an error in libvirt where the baselineCPU function would error before returning the XML to us.

If I'm not misinterpreting this issue I think we should ensure we don't place duplicate CPU features into our array.

Something like the attached patch would prevent duplicates from entering our list.

Revision history for this message
Jon XML (d0m0reg00dthing) wrote :
Download full text (9.1 KiB)

Patching All path above but the problem still there. This is my nova-compute log
2014-04-19 01:41:22.842 4397 INFO nova.virt.libvirt.driver [req-1ee12afe-c0ec-4d82-a6a5-3280407ca8e1 67ac350f0a164433ac342b4960300341 a3ace7f03db4449db2b57f95fadc94ea] Instance launched has CPU info:
{"vendor": "Intel", "model": "core2duo", "arch": "x86_64", "features": ["lahf_lm", "rdtscp", "lahf_lm", "rdtscp", "lm", "nx", "syscall", "ssse3", "monitor", "pni", "sse2", "sse", "fxsr", "mmx", "clflush", "pse36", "pat", "cmov", "mca", "pge", "mtrr", "sep", "apic", "cx8", "mce", "pae", "msr", "tsc", "pse", "de", "vme", "fpu"], "topology": {"cores": 1, "threads": 1, "sockets": 1}}
2014-04-19 01:41:22.846 4397 ERROR nova.virt.libvirt.driver [req-1ee12afe-c0ec-4d82-a6a5-3280407ca8e1 67ac350f0a164433ac342b4960300341 a3ace7f03db4449db2b57f95fadc94ea] CPU doesn't have compatibility.

XML error: CPU feature `lahf_lm' specified more than once

Refer to http://libvirt.org/html/libvirt-libvirt.html#virCPUCompareResult
2014-04-19 01:41:22.868 4397 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: XML error: CPU feature `lahf_lm' specified more than once
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 88, in wrapped
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher payload)
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 68, in __exit__
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 71, in wrapped
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher return f(self, context, *args, **kw)
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 309, in decorated_function
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher e, sys.exc_info())
2014-04-19 01:41:22.868 4397 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/dis...

Read more...

Revision history for this message
Sachi King (nakato) wrote :

Hi Jon,

I can't duplicate this issue with my patch applied to the systems I have, it is successfully filtering out the duplicates on all 6 of my test systems.

I've attached a small bit of code to test the point where I applied the patch. This replicates a small section of the code from nova where it generates the list and causes the duplication I saw.

If this returns 1 and tells you it found a duplicate you'll need to verify the patch applied properly, then do some debugging. The patch was generated against 2014.1.1. For use in the cluster it will require all systems with nova-compute running to be patched.

If that returns clean but you still can't perform a live-migration you'll need dig a bit to figure out what's going on. The first patch by Maximiliano should ensure that it filters the duplicate XML features no mater where they are generated.

Revision history for this message
Vladik Romanovsky (vladik-romanovsky) wrote :

Have you tried using https://review.openstack.org/#/c/65360/ ?
This should resolve the problem.

Revision history for this message
Simon Elmir (nerd65536) wrote :

https://review.openstack.org/#/c/65360/ does not resolve this issue - The problem persists in Nova version 2014.1-0ubuntu1, which already includes this change.

I can confirm that "ensure_added_feature_is_unique.patch" does fix the problem.

Changed in nova:
assignee: nobody → Vladik Romanovsky (vladik-romanovsky)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/97782

Changed in nova:
status: Confirmed → In Progress
Matt Riedemann (mriedem)
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/97782
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=60c899b9da8d64b4a5979b69c43c77ed9d5bf248
Submitter: Jenkins
Branch: master

commit 60c899b9da8d64b4a5979b69c43c77ed9d5bf248
Author: Vladik Romanovsky <email address hidden>
Date: Tue Jun 3 14:58:06 2014 -0400

    libvirt: convert cpu features attribute from list to a set

    Currently, the cpu features list which is being sent to libvirt,
    when creating a domain or calling compareCPU, must contain only
    unique entries. Multiple issues arise when we are updating the
    features attribute in LibvirtConfigCPU class (for example during
    migration).

    This change will change the features attribute from being a list
    to a set. This make the LibvirtConfigCPU class keep only unique
    features.
    Adjusting the LibvirtConfigCPUFeature class to support set
    operations by overriding the __eq__, __ne__ and __hash__
    methods.

    Closes-Bug: #1303536
    Change-Id: I6350fe0e827c860aea77cc4fe56f18f5c1483580

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/100632

Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/100632
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6b7cb1ae972da835cfdeaca6f3d5556cffe89ff3
Submitter: Jenkins
Branch: stable/icehouse

commit 6b7cb1ae972da835cfdeaca6f3d5556cffe89ff3
Author: Vladik Romanovsky <email address hidden>
Date: Tue Jun 3 14:58:06 2014 -0400

    libvirt: convert cpu features attribute from list to a set

    Currently, the cpu features list which is being sent to libvirt,
    when creating a domain or calling compareCPU, must contain only
    unique entries. Multiple issues arise when we are updating the
    features attribute in LibvirtConfigCPU class (for example during
    migration).

    This change will change the features attribute from being a list
    to a set. This make the LibvirtConfigCPU class keep only unique
    features.
    Adjusting the LibvirtConfigCPUFeature class to support set
    operations by overriding the __eq__, __ne__ and __hash__
    methods.

    Closes-Bug: #1303536
    Change-Id: I6350fe0e827c860aea77cc4fe56f18f5c1483580
    Cherry-pick-x: 60c899b9da8d64b4a5979b69c43c77ed9d5bf248

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.