changing data-port does not remove previously configured interfaces from OVS bridges

Bug #1915967 reported by Drew Freiberger
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Neutron Gateway Charm
Triaged
Wishlist
Unassigned
OpenStack Neutron Open vSwitch Charm
Triaged
Wishlist
Unassigned

Bug Description

Related to lp#1809190, we have found that when making a config-change on neutron-openvswitch or neutron-gateway charms to data-port, that the prior interfaces mapped are not purged before adding the new interface.

For instance, if I deploy a cloud with bridge-mappings="physnet1:br-data" and data-port="br-data:bond0", and later find that I need to move br-data to bond1 fabric, I run:

juju config neutron-openvswitch data-port="br-data:bond1"

The expected outcome is that 'ovs-vsctl list-ports br-data' should return bond1 and phy-br-data.

The actual outcome in this scenario is that the bond now contains bond0, bond1, and phy-br-data.

While lp#1809190 was previously written as a bug when switching from external-network-id and external-network-port configurations to bridge-mappings and data-port, the issue for data-port config-changes still exist.

This leads to data-plane outages by creating a network loop when the two fabrics joined by the OVS bridge create a network loop/broadcast storm.

Tags: sts
Revision history for this message
Andrea Ieri (aieri) wrote :

Subscribed field-high. I think it'd be reasonable to expect the charm to support config changes, or at least to block when they aren't (e.g. changing ovs-use-veth).

For the record, a manual workaround is available: simply remove the old bond with ovs-vsctl del-port br-data <the old bond>

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Indeed in lp:1809190 we have done the work [0] for marking the ports and bridges that are managed by the charm, so that it would be less risky later on to implement actually deleting obsolete/lingering ports and bridges that we own. But the actual deletion hasn't been implemented yet.

[0] https://review.opendev.org/c/openstack/charm-neutron-gateway/+/717074/

Changed in charm-neutron-gateway:
importance: Undecided → High
Changed in charm-neutron-openvswitch:
importance: Undecided → High
Changed in charm-neutron-gateway:
status: New → Triaged
Changed in charm-neutron-openvswitch:
status: New → Triaged
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Andrea, I don't think this is a field-high. There is a known work-around to the issue. I agree that it is an issue, but it shouldn't trump more critical bugs with no work-around. Please could you consider removing the field-high subscription from it? Thanks.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Note that an attempt [0] was made at implementing this. Parts of it [1] landed, but the actual deletion of obsolete ports and bridges didn't land. Anyone who intend to implement may want to have a look at this.

[0] https://review.opendev.org/c/openstack/charm-neutron-gateway/+/673849
[1] https://review.opendev.org/c/openstack/charm-neutron-gateway/+/717074

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

It's also worth mentioning that the work for marking ports and bridges as managed by us was done in charm-neutron-gateway but not in charm-neutron-openvswitch. [0]

[0] https://bugs.launchpad.net/charm-neutron-gateway/+bug/1809190/comments/7

Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

I filed a separate bug lp:1917025 for the missing functionality in charm-neutron-openvswitch so that the progress can be tracked and discussed there. Fixing it should be the first step towards resolving this larger issue.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

For context, it appears we're seeing this issue when switching types of compute/networking resources are configured for a given hyperconverged node.

Imagine a scenario where you have a number of sriov nodes with bond0 as the br-data and non-sriov nodes with bond1 as the br-data. If you also have ceph-osd deployed to these hyperconverged nodes and remove the sriov openvswitch config and deploy the non-sriov openvswitch config, the br-data:bond0 to br-data:bond1 data-port mapping change isn't properly affected by the removal/addition of the charm to the already deployed unit.

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

As of our internal discussion today, we decided to redesign several aspects around the data-port config. That redesign will be first agreed upon through a charm spec and then implemented.

That work, however, will depend on https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1917025 completion.

The current proposal (will be further discussed in the spec) is to have a "manual" value for data-port config, where the charm will no longer touch the bridges and ports according to changes in the data-port config. When in this state, changes to the port would be done through "add" and "remove" juju actions. When not in "manual" mode, the behavior will not be changed. There are several motivations for this decision and they will be listed in the spec.

Meanwhile, documentation around the use of the data-port config will be improved so this problem is avoided. The redesign work will further enhance the documentation along with the new changes.

tags: added: sts
Revision history for this message
Drew Freiberger (afreiberger) wrote :

In effort to gather use cases for changing a data-port on a deployed node, we’ve gathered the following scenarios that have created situations where data-ports for physical networks/provider networks need to be changed post-deployment:

1. Hyperconverged deployments with nova-compute (with neutron-openvswitch-agent or ovn-chassis subordinate) and ceph-osd co-resident (hulk-smashed) onto the same metal needing to re-deploy a different nova-compute/networking flavor to a hypervisor while having capacity or performance related reasons requiring ceph-osd to stay online on the node.

Hypervisor type A has physnet1 on bond0 fabric
Hypervisor type B has physnet1 on bond1 fabric
Bond0 and bond1 are two fabrics hosted on the same spine-leaf switches
Metal machine X has nova-compute-A deployed as well as ceph-osd as principal units
Expected process:
Remove nova-compute-A unit from metal X
Add nova-compute-B unit to metal X without redeployment

2. Live Re-architecture of cloud networking.

Network team wants to move physnet1 vlans from bond0 fabric to bond1 fabric
Network team enables vlan tagging on the bond1 fabric while leaving them enabled on bond0 fabric.
Current setting for n-ovs-agent is data-port=br-data:bond0
Cloud operator runs juju config neutron-openvswitch-agent data-port=br-data:bond1
Expected results is that bond0 is removed from br-data, and bond1 is added, resulting in very small connectivity loss for that physnet on each OVS.

Revision history for this message
Przemyslaw Lal (przemeklal) wrote :

The functionality of marking of OVS ports, bonds and bridges by charm-neutron-openvswitch has been merged and is available on the master branch (along with changes in handling conflicting ext-port and data-port configuration). Please visit lp:1917025 for more details.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I've changed this to wishlist as it's not a bug in existing functionality so much as a request to add a new feature. Some of the ground work has been done, but further design work and, crucially, a specification is needed to describe exactly what the feature should do.

Changed in charm-neutron-gateway:
importance: High → Wishlist
Changed in charm-neutron-openvswitch:
importance: High → Wishlist
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Some work has been done by tagging interfaces that are created to ensure they are charmed-owned. However, this is accurately described as being bigger than a bug and needs dedicated time to it as the charms were not designed to handle this particular scenario.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.