Fail to launch an OpenStack Instance with hugepages on top of OVS+DPDK

Bug #1680956 reported by Thiago Martins
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
dpdk (Ubuntu)
Fix Released
Low
Christian Ehrhardt 
libvirt (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Guys,

 I'm trying to launch an Instance on OpenStack Ocata, using Ubuntu 16.04, on top of OpenvSwitch with DPDK, and OVN.

 I can launch an Instance without hugepages but, when I change OpenStack's Flavor for it to have: "hw:mem_size_pages=large", the following error appear on nova-compute.log:

---
2017-04-07 20:13:18.504 3996 ERROR nova.compute.manager [instance: fbbb7bde-763e-40d3-8987-fb3a6c568ad8] libvirtError: internal error: process exited while connecting to monitor: 2017-04-07T20:13:17.764939Z qemu-system-x86_64: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages-1048576/libvirt/qemu,share=yes,size=4294967296,host-nodes=0,policy=bind: can't open backing store /dev/hugepages-1048576/libvirt/qemu for guest RAM: Permission denied
---

At syslog:

---
Apr 7 20:13:17 expert-jennet kernel: [ 1535.437956] audit: type=1400 audit(1491595997.759:23): apparmor="DENIED" operation="mknod" profile="libvirt-fbbb7bde-763e-40d3-8987-fb3a6c568ad8" name="/dev/hugepages-1048576/libvirt/qemu/qemu_back_mem._objects_ram-node0.BFNx2W" pid=8341 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=64055 ouid=64055
---

Maybe it is related to:

https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1524737

But I'm not sure if it is relate to 1524737 or not...

Let me know if there are any workarounds available!

Thanks,
Thiago

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: libvirt0 2.5.0-3ubuntu5~cloud0 [origin: Canonical]
ProcVersionSignature: Ubuntu 4.8.0-46.49~16.04.1-generic 4.8.17
Uname: Linux 4.8.0-46-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.5
Architecture: amd64
CrashDB:
 {
                "impl": "launchpad",
                "project": "cloud-archive",
                "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml",
             }
Date: Fri Apr 7 20:27:27 2017
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: libvirt
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Thiago Martins (martinx) wrote :
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Hi Thiago; there may be a better solution available; I think a workaround that could get you running again might be to edit /etc/apparmor.d/libvirt/TEMPLATE.qemu and include a line like:

  /dev/hugepages-*/libvirt/qemu/qemu_back_mem._objects_ram-node?.?????? rw,

Then run apparmor_parser --replace /etc/apparmor.d/libvirt/libvirt-fbbb7bde-763e-40d3-8987-fb3a6c568ad8

Thanks

Revision history for this message
Nish Aravamudan (nacc) wrote :

So I think, based upon some quick IRC discussion, the issue is:

/dev/hugepages-1048576/libvirt/qemu

while the apparmor profile is for /dev/hugepages/libvirt/qemu/**.

I wonder why the size is there -- maybe using a non-default hugepage?

Should the profile be udpated to

/dev/hugepages*/libvirt/qemu/**

in the source to support multiple hugepage sizes?

Revision history for this message
Thiago Martins (martinx) wrote :

ERRATA, the option that I'm using is:

"hw:mem_page_size=large" and not "hw:mem_size_pages=large" as I initially had reported.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.7 KiB)

As I discussed with Nish (thanks for the update here) we might need to extend the apparmor profile - but we should understand the case better before doing so.
I use KVM guests on systems with DPDK hugepages as well without issues.
And I use huge pages for them so something has to be special in the charms configuring them here.

If you enable hugepages for DPDK it will make sure there are hugepage mounts for each kind of page sizes. So far we haven't had an issue with it but it seems if on the same system you run qemu with huge pages this issue occurs.

Usually I set a 1 in /etc/default/qemu-kvm:
  9 # Set this to 1 if you want hugepages to be available to kvm under
 10 # /run/hugepages/kvm
 11 KVM_HUGEPAGES=0

And my guests have:
<memoryBacking>
  <hugepages>
    <page size="2" unit="M" nodeset="0"/>
  </hugepages>
</memoryBacking>

This setup is working fine so far, but especially the explicit page size is a consequence of the support for multiple page sizes by libvirt/qemu and might be missing in the charm/openstack so far.

If the option KVM_HUGEPAGES is set qemu-kvm ensures there is a mountpoint (of the default huge page size) in /run/hugepages/kvm
  mkdir -p /run/hugepages/kvm
  mount -t hugetlbfs hugetlbfs-kvm -o mode=775,gid=kvm /run/hugepages/kvm

BTW "owner "/run/hugepages/kvm/libvirt/qemu/**" rw," is also in the apparmor profile.

But libvirt without explicit config will pick on hugepages from /proc/mounts.
But in some sense that means would need to include any target dir hugepages will ever be mounted to. Of course we will might add the wildcard for the DPDK paths, but still that leaves the issue open for any other path added by e.g. an Database admin for Hugepages to his DB.
In fact there is a config that can set this as needed in /etc/libvirt/qemu.conf

Quoting from that config:
If provided by the host and a hugetlbfs mount point is configured,
a guest may request huge page backing. When this mount point is
unspecified here, determination of a host mount point in /proc/mounts
will be attempted. Specifying an explicit mount overrides detection
of the same in /proc/mounts. Setting the mount point to "" will
disable guest hugepage backing. If desired, multiple mount points can
be specified at once, separated by comma and enclosed in square
brackets, for example:
[...]

It might be worth to set something in there as a workaround to disable the suboptimal detection.
$ echo 'hugetlbfs_mount = ["/run/hugepages/kvm"]' >> /etc/libvirt/qemu.conf

Essentially qemu only "does what it is told" like in
   -object memory-backend-file,id=mem1,size=1G,mem-path=/mnt/hugepages-1G \
   -device pc-dimm,id=dimm1,memdev=mem1 \
So it might be interesting to see what commandline libvirt creates.

Questions:
- Was there any chance to test the workaround Seth suggested?
- Thiago, could you...

Read more...

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I checked the DPDK code which creates the extra mountpoint.
It only creates the extra mountpoints if
1. a given size e.g 1G is configured in dpdk.conf
2. no mountpoint for this size exists

In general a system always has /dev/hugepages at the "default" hugepagesz.
The default size can be changed as kernel parm, so this might be either one or another size.

Since DPDK wants to provide a simple way to get users going with it the hugepage setup is somewhat automated and working for non complex cases as outlined by the comment in the config file.
So if one is e.g. setting NR_1G_PAGES and 1G is not the current default size it will create a mountpoint. There the format is /dev/hugepages-${sizeinkb} then.

Qemu-kvm at the same time creates /run/hugepages/kvm if KVM_HUGEPAGES is set.

To some extend that means if somebody adds more hugepage mountpoints to the system either via DPDK or otherwise this is a config change that needs other config changes to work well.
IMHO in this case one should set the hugetlbfs_mount in /etc/libvirt/qemu.conf -
especially since this path is already allowed for apparmor.

Since we have a statistical reason to cause this I'll add it to the libvirt profile, but in general that is more a configuration issue than a bug.

In addition to my questions in comment #6 I'd ask to get a copy of /etc/dpdk/dpdk.conf in your case if possible.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I already had the patch complete and was writing the changelog, but while doing that and the dep8 header I realized that this really isn't something we want to do.

Effectively that would "invite" latter and worse issues - in this case for example you don't want qemu to map against the 1G pages in /dev/hugepages-1048576.
You really might want to uncomment the hugetlbfs_mount in /etc/libvirt/qemu.conf to disable the guesswork libvirt is doing.
If you would really want it to map against that you'd still want to set hugetlbfs_mount to have it reliably pick up the right path and in case of a non standard path you'd then want to extend the apparmor profile as Seth suggested.

I still want to continue discussing this and it would be great Thiago, if you could provide the files I asked for of your case. Also I might overlook something, so discussion is certainly worth it.

I'll however make a comment entry in the DPDK config file about the issue.
Adding a tracker now.

Changed in dpdk (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Suggestion of a comment pushed to deb_dpdk project

Changed in dpdk (Ubuntu):
status: Triaged → In Progress
assignee: nobody → ChristianEhrhardt (paelzer)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Comment in config accepted in Deb DPDK - and integrated in latest Ubuntu due to merge.
The comment is not SRU worthy, and a very complex solution out of the hugepage confusion overall is out of the scope of the bug.

Changed in dpdk (Ubuntu):
status: In Progress → Fix Released
Changed in cloud-archive:
status: New → Invalid
Changed in libvirt (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.