Add alias to support Microsoft Azure RDMA device via MLX4
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
rdma-core (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Bionic |
Fix Released
|
Medium
|
Unassigned |
Bug Description
[Impact]
* Azure RDMA devices can be a "Network Direct" version of the MLX
InfiniBand card that is exposed to the guest and they need to be
handled by the MLX4 driver.
But to be enabled properly the system needs to detect them as such.
That was added in rdma-core v20 hence being fix released in >=Disco.
* Fix [1] is just a new alias to detect the card as what it is.
[Test Case]
* The TL;DR is to use rdma on "Microsoft Azure RDMA device".
There is no good way without an Azure instance to test this specific
bug. The check is if the userspace drivers are then correctly loaded.
Lacking such an instance myself Microsoft helped on the PPA and will
help on SRU verification.
[Regression Potential]
* a) The regression potential should be minimal. Existing systems either
already match the HCA table or they don't (neither before nor after the
fix. An issue I could think of is if there are devices that announce as
vmbus:
this device type - they would then run into issues.
* b) Of course the fix will "enable rdma support on some more devices" if
someone had those devices attached but didn't use/configure them they
might now initialize a bit further. But that isn't
an issue and especially in cloud environments where HW config is just a
click away no one usually pays for extra devices without using them.
* c) I happen to know that for DPDK usage of these devices several fixes
in later rdma-core are preferred or even needed. I'm not sure about the
usage in this case - but enabling could expose those issues which
formerly were hidden behind the "not supported" misdetection of the
card.
For B) and C) I'd want Microsoft to ack and test this from a PPA and do
the verification on this - to be not only ok for this but for for Azure
in general.
[Other Info]
* This falls under the SRU exception of "other safe cases" for "For Long
Term Support releases we regularly want to enable new hardware". New
modaliases are explicitly listed there as cases for this.
* This is reported to work in 16.04, but not as a classic "this commit
broke it regression". Back then the world of rdma/infiniband just
worked totally different as it was before the big revamp into rdma-core
never the less one could see that (abstract, not caring about details)
as an update-regression when going 16.04 to 18.04.
[1]: https:/
Related branches
- Andreas Hasenack: Approve
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 65 lines (+45/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/series (+1/-0)
debian/patches/ubuntu/lp-1838939-Add-Microsoft-Azure-RDMA-device-to-MLX4-HCA-table.patch (+37/-0)
Changed in rdma-core (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → Medium |
description: | updated |
description: | updated |
tags: |
added: verification-done verification-done-bionic removed: verification-needed verification-needed-bionic |
I pinged Microsoft to add some details/statements here that only they can ...