juju/MAAS deployed host with bonding configured via preseed missing eth0 from bond on first boot

Bug #1341524 reported by Gareth Woolridge
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Invalid
High
Unassigned

Bug Description

We are deploying a new Openstack environment using juju 1.18.4 and require the compute and other nodes to utilise interface bonding for increased bandwidth availability. To do this we configure bonding of eth0 and eth1 via the MAAS preseed.

When we deploy using juju we find that eth0 is down and not in the bond following the juju agent and charm deploy. This means that when our deployment of Openstack using charms is complete we must reboot our compute, storage and infrastructure nodes in order to get the full performance of bond0.

This looks to be caused by an ifdown eth0 (seen in cloud-init-output.log) immediately prior to downloaded the tools and installing configuring the juju agent on the host.

Additionally when shmooshing services onto a combined infra node using lxc we need both bonding and bridging, and in this scenario the 'ifdown eth0' actually perturbs connectivity just long enough for the curl of the tools to fail, meaning the host/instance gets stuck in state pending. The agent install process does not seem to retry if that curl fails.

Juju and the MAAS provider should be aware that some deployment scenarios require custom network config such as bonding of interfaces.

Revision history for this message
Gareth Woolridge (moon127) wrote :
Revision history for this message
Curtis Hovey (sinzui) wrote :

The maas-provider and networking changed substantially in Juju 1.20.x We want to retest this with 1.20.1.

Changed in juju-core:
status: New → Incomplete
tags: added: maas-provider
Ian Booth (wallyworld)
Changed in juju-core:
milestone: none → 1.21-alpha1
importance: Undecided → High
status: Incomplete → Triaged
Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in juju-core:
status: Triaged → Confirmed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Confirmed → Triaged
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello Gareth,

Could you provide me the preseed file and also the resulting /etc/network/interfaces file?

Thanks!

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello,

This specific issue is caused by an 'ifdown interface' performed on the injected cloud-init, this was introduced in favor of the previous 'service networking restart' as you can see performing a git diff -r 411dbff4^! . This change was made because that command were killing dbus on the local provider.

First, please make sure that your interface configuration file ( ethX.cfg) is stored into /etc/network/interfaces.d/ directory, this place is the source for configuration files, which is set by the restoreInterfacesFiles method.

I can imagine a few solutions for this issue, first, we could extend the cloud-init commands that a user can run with an option passed to juju deploy or via the environments.yaml, on that way juju can consume a list of commands to be performed on the cloud-init phase ( such as configure a bond interface, or restart networking).

Another possible solution would be to force a ifup %interface% before the bridge is turned on, for this i did a merge request extending the createNetworkBridge method:

https://github.com/juju/juju/pull/564/files

Please let me know what if you have any observations regarding to this.

Revision history for this message
Gareth Woolridge (moon127) wrote :

Jorge,

We've since moved to trusty with the emx, p3px etc interface naming as opposed to ethx which resolves our issue, although juju/cloud-init still attempt the ifdown eth0 step so I do think something needs awareness ofthe new device naming also there!

I've attached our e/n/i, e/n/i.d/br0.config, maas preseed file and details of how we set up bonding, bridging etc. For the latter we wrote a script that can take various options for bonding, bridging and setting MTU (on some nodes we need jumbo frames support), and create an upstart job which configures this for us on first boot.

Our custom requirements seem to conflict with how juju/cloud-init does it's own config, but I do think it's important that between juju and the sysadmin deploying via MAAS that custom bonding and bridging requirements are covered.

The attached files are for a trusty deploy as I say, but if we renamed em1,em2 to eth0,eth1 would be as we observed under precise.

Revision history for this message
Gareth Woolridge (moon127) wrote :
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

@moon127

The ifdown eth0 command was forced before 1.20.5 , now you can specify a network-bridge interface
that will be used on the 'ifdown $network-bridge" instead of forcing to use eth0. Since you can specify
that , you can point your bridge interface to p3px or whatever other interface depending on your setup.

Yes, indeed, your specific requirements conflict with the cloud-init, my overall recommendation would be
to put your custom configuration files on /etc/networks/interfaces.d/, specify on juju your network-bridge, and having https://github.com/juju/juju/pull/572 will prevent juju to turn down your bridge-interface in any case.

I think that should solve your situation.

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

This change https://github.com/juju/juju/pull/618 , fixes this issue. Since having the option disable-network-management: true , will prevent juju to bring down the defined network-bridge and thus bring down any special configuration.

Thanks!.

Revision history for this message
Ian Booth (wallyworld) wrote :

Based on https://github.com/juju/juju/pull/618 I'm marking this as Invalid assuming the issue is fixed.

If there's still a problem, please re-open this bug.

no longer affects: juju-core/1.20
Changed in juju-core:
status: Triaged → Invalid
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello,

Please see https://bugs.launchpad.net/juju-core/+bug/1361374 as a reference fix for this issue.

Thanks!

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.21-alpha1 → none
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

FTR tried to specify "network-bridge: bond0" which one would expect to DTRT,
but it doesn't work either because below sequence by juju dismantles
bond0 at ifdown (#1), then ifup br0 (#4) doesn't rebuild it:

===
ifdown bond0 #1
cat > /etc/network/bond0.config << EOF #2
iface bond0 inet manual

auto br0
iface br0 inet dhcp
  bridge_ports bond0
EOF
sed -i "s/iface bond0 inet dhcp/source \/etc\/network\/bond0.config/ /etc/network/interfaces #3
ifup br0 #4
===

NOTE that rebooting it with above resulting files does ok, so it's 'ifdown bond0' behavior (to
disconnect slave devices) what's not reversed by ifup br0 - interestingly enough it's not
either fixed by e.g. "ifup bond0 ; ifup br0" (you can try it at a kvm VM), but rather the full
'sequence' of ifup eth0, bond0, br0 what makes it work at boot.

As a workaround until we have a stable juju with disable-network-management config,
I found you can specify a fake device at environments.yaml e.g.:

   network-bridge: none-bug1341524

which will make above sed to leave /etc/network/interfaces untouched (albeit creating
an unused /etc/network/$fake.config file).

Changed in juju-core:
assignee: Jorge Niedbalski (niedbalski) → nobody
tags: added: network
Matt Rae (mattrae)
tags: added: cpec
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.