neutron-dhcp-agent fails to start in TripleO environment "IOError: [Errno 22] Invalid argument"

Bug #1212401 reported by Clint Byrum
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Low
Edgar Magana

Bug Description

While running these steps:

https://github.com/openstack/tripleo-incubator/blob/master/devtest.md

In the 'overcloud-control' host, quantum-dhcp-agent is failing to start. The backtrace is:

2013-08-14 18:36:30.074 18361 DEBUG neutron.openstack.common.rpc.amqp [-] UNIQUE_ID is 743c6834c175462781ff7a0d8b0c6447. _add_unique_id /opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/openstack/common/rpc/amqp.py:325
2013-08-14 18:36:30.082 18361 ERROR neutron.openstack.common.rpc.common [-] AMQP server on 192.0.2.5:5672 is unreachable: [Errno 22] Invalid argument. Trying again in 1 seconds.
2013-08-14 18:36:30.087 18361 ERROR neutron.openstack.common.rpc.common [-] AMQP server on 192.0.2.5:5672 is unreachable: [Errno 22] Invalid argument. Trying again in 1 seconds.
2013-08-14 18:36:30.087 18361 ERROR neutron.openstack.common.threadgroup [-] [Errno 22] Invalid argument
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup Traceback (most recent call last):
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/openstack/common/threadgroup.py", line 117, in wait
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup x.wait()
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/openstack/common/threadgroup.py", line 49, in wait
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup return self.thread.wait()
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/eventlet/greenthread.py", line 168, in wait
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup return self._exit_event.wait()
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/eventlet/event.py", line 116, in wait
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup return hubs.get_hub().switch()
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 187, in switch
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup return self.greenlet.switch()
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 236, in run
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup self.wait(sleep_time)
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 84, in wait
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup presult = self.do_poll(seconds)
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 61, in do_poll
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup return self.poll.poll(seconds)
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup IOError: [Errno 22] Invalid argument
2013-08-14 18:36:30.087 18361 TRACE neutron.openstack.common.threadgroup

A full error is at http://paste.openstack.org/show/44151/

This must be a race, because running under 'strace -f', which tends to slow everything down, results in a working quantum-dhcp-agent. I will attach straces of an unsuccessful (without -f) and successful (with -f) strace run.

Tags: l3-ipam-dhcp
Revision history for this message
Clint Byrum (clint-fewbar) wrote :
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Note that this one worked, and ended when I terminated it with ctrl-c

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

The difference is really confusing. In the failing case, it appears that the epoll control FD is closed before epoll_ctl is called.

open("/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/kombu/transport/base.pyc", O_RDONLY|O_LARGEFILE) = 6
fstat64(6, {st_mode=S_IFREG|0644, st_size=11852, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb70b8000
read(6, "\3\363\r\n\t,\vRc\0\0\0\0\0\0\0\0\4\0\0\0@@\0\0s\307\0\0\0d\0"..., 4096) = 4096
fstat64(6, {st_mode=S_IFREG|0644, st_size=11852, ...}) = 0
read(6, "errorst\3\0\0\0exc(\0\0\0\0(\0\0\0\0sL\0\0\0/op"..., 4096) = 4096
read(6, "-packages/kombu/transport/base.p"..., 4096) = 3660
read(6, "", 4096) = 0
close(6) = 0
munmap(0xb70b8000, 4096) = 0
close(5) = 0
close(4)

But in the not failing case, this never happens:

open("/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/kombu/transport/base.pyc", O_RDONLY|O_LARGEFILE) = 7
fstat64(7, {st_mode=S_IFREG|0644, st_size=11852, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7054000
read(7, "\3\363\r\n\t,\vRc\0\0\0\0\0\0\0\0\4\0\0\0@@\0\0s\307\0\0\0d\0"..., 4096) = 4096
fstat64(7, {st_mode=S_IFREG|0644, st_size=11852, ...}) = 0
read(7, "errorst\3\0\0\0exc(\0\0\0\0(\0\0\0\0sL\0\0\0/op"..., 4096) = 4096
read(7, "-packages/kombu/transport/base.p"..., 4096) = 3660
read(7, "", 4096) = 0
close(7) = 0
munmap(0xb7054000, 4096) = 0
close(6) = 0
close(5) = 0

So there may be some kind of off by one problem somewhere.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Aha! This happens when dnsmasq is not installed. Presumably the check for the version fails.

tags: added: l3-ipam-dhcp
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

I can confirm that installing dnsmasq solves the issue.

Changed in neutron:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Edgar Magana (emagana) wrote :

When it is not possible to determinate the dnsmasq version, neutron just displays a warning instead of a error message, so it easy to be ignored. I propose to chance this LOG to "error" instead of just warning.

 LOG.warning(_('Unable to determine dnsmasq version. '
                          'Please ensure that its version is %s '
                          'or above!'), cls.MINIMUM_VERSION)

Changed in neutron:
assignee: nobody → Edgar Magana (emagana)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/96976

Changed in neutron:
status: Triaged → In Progress
Kyle Mestery (mestery)
Changed in neutron:
milestone: none → juno-1
Kyle Mestery (mestery)
Changed in neutron:
milestone: juno-1 → juno-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/96976
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f3165d2a48b039fb2d53ff2da72d57fe65fe3db3
Submitter: Jenkins
Branch: master

commit f3165d2a48b039fb2d53ff2da72d57fe65fe3db3
Author: Edgar Magana <email address hidden>
Date: Fri May 30 16:27:35 2014 -0700

    Stop the dhcp-agent process when dnsmasq version is not determined

    When it is not possible to determinate the dnsmasq version
    or dnsmasq is not installed, neutron shows only a warning message.
    With this commit the dhcp-agent process will be stopped if the
    dnsmasq service is not installed.
    Modifies the dhcp version unit test with the expected SystemExit exception

    Change-Id: Ia26eaae6f432305f412a4b6a4a6b2b7b15d99d45
    Closes-Bug: 1212401

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
Hemanth Ravi (hemanth-ravi) wrote :

The fix for this bug breaks q-dhcp agent on ubuntu 12.04 since dnsmasq --version can only be run as root and neutron/agent/linux/utils.py/create_process() is not invoked with root_helper.

This is breaking the 3rd party plugin CI

Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.