Cannot start VMs without routable IPv4 address

Bug #1492621 reported by Kevin Otte
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
GLibC
Confirmed
Medium
glibc (Ubuntu)
Confirmed
Undecided
Unassigned
qemu (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

qemu will not start VMs using spice or vnc displays unless there is a routable IPv4 address on the machine, even though the error relates to 127.0.0.1

root@athens:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether a4:ba:db:32:4c:6b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a6ba:dbff:fe32:4c6b/64 scope link
       valid_lft forever preferred_lft forever
3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether a4:ba:db:32:4c:6c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a6ba:dbff:fe32:4c6c/64 scope link
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether a6:01:71:d4:b2:71 brd ff:ff:ff:ff:ff:ff
5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether a4:ba:db:32:4c:6b brd ff:ff:ff:ff:ff:ff
    inet6 2606:a000:a461:4500:a6ba:dbff:fe32:4c6b/64 scope global mngtmpaddr dynamic
       valid_lft 86292sec preferred_lft 14292sec
    inet6 fe80::c03d:22ff:fea5:3034/64 scope link
       valid_lft forever preferred_lft forever
root@athens:~# virsh start icarus
error: Failed to start domain icarus
error: internal error: process exited while connecting to monitor: ((null):4086): Spice-Warning **: reds.c:2330:reds_init_socket: getaddrinfo(127.0.0.1,5900): Address family for hostname not supported
2015-09-05T17:43:39.911871Z qemu-system-x86_64: failed to initialize spice server

root@athens:~# dhclient br0
root@athens:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether a4:ba:db:32:4c:6b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a6ba:dbff:fe32:4c6b/64 scope link
       valid_lft forever preferred_lft forever
3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether a4:ba:db:32:4c:6c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a6ba:dbff:fe32:4c6c/64 scope link
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether a6:01:71:d4:b2:71 brd ff:ff:ff:ff:ff:ff
5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether a4:ba:db:32:4c:6b brd ff:ff:ff:ff:ff:ff
    inet 172.31.3.106/24 brd 172.31.3.255 scope global br0
       valid_lft forever preferred_lft forever
    inet6 2606:a000:a461:4500:a6ba:dbff:fe32:4c6b/64 scope global mngtmpaddr dynamic
       valid_lft 86335sec preferred_lft 14335sec
    inet6 fe80::a6ba:dbff:fe32:4c6b/64 scope link
       valid_lft forever preferred_lft forever
root@athens:~# virsh start icarus
Domain icarus started

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: qemu-system-x86 1:2.2+dfsg-5expubuntu9.4
ProcVersionSignature: Ubuntu 3.19.0-26.28-generic 3.19.8-ckt4
Uname: Linux 3.19.0-26-generic x86_64
ApportVersion: 2.17.2-0ubuntu1.3
Architecture: amd64
Date: Sat Sep 5 13:46:23 2015
InstallationDate: Installed on 2015-08-29 (6 days ago)
InstallationMedia: Ubuntu-Server 15.04 "Vivid Vervet" - Release amd64 (20150422)
MachineType: Dell Inc. PowerEdge T310
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.19.0-26-generic.efi.signed root=/dev/mapper/vg_athens-lv_root ro
SourcePackage: qemu
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/06/2013
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.12.0
dmi.board.name: 0MNFTH
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 17
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.12.0:bd09/06/2013:svnDellInc.:pnPowerEdgeT310:pvr:rvnDellInc.:rn0MNFTH:rvrA00:cvnDellInc.:ct17:cvr:
dmi.product.name: PowerEdge T310
dmi.sys.vendor: Dell Inc.

Revision history for this message
In , Tore Anderson (toreanderson) wrote :

getaddrinfo() will disregard the loopback addresses ::1 and 127.0.0.1 when attempting to figure out which address families are present on the host, when being called with the AI_ADDRCONFIG flag.

This makes a lot of sense when looking up an external hostname. However, it makes very little sense when connecting to the hostname "localhost". I've learned that the browser vendors will avoid using AI_ADDRCONFIG or do it while using workarounds for the localhost case, see for instance:

https://bugzilla.mozilla.org/show_bug.cgi?id=614526

Without such a workaround, connecting to a IPv4-only service listening on 127.0.0.1 using the hostname "localhost" will fail unless the machine also has external IPv4 connectivity. Which is not what a user would expect, since the (lack of) external connectivity is irrelevant to the accessibility to the loopback interface.

Therefore, when looking up "localhost", the loopback addresses ::1 and 127.0.0.1 should not be ignored by getaddrinfo() when using AI_ADDRCONFIG

Tore

Revision history for this message
In , Psimerda (psimerda) wrote :

The same applies to *all* link-local IPv6 addresses as well as link-local IPv4 addresses.

The same applies for all alternative names for localhost:

For example on Fedora:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

People even use FQDNs for their localhost address to test their stuff without
network connection.

See also comments in bug 12377.

Revision history for this message
In , Psimerda (psimerda) wrote :

Currently with glibc-2.15-51.fc17.x86_64 I can reproduce it with:

  hints.ai_family = AF_INET6;
  hints.ai_flags = AI_ADDRCONFIG; // and optional AI_V4MAPPED

I can no longer reproduce with SSH (possibly because of some updates
that affect SSH's networking, i don't know).

My feeling is that we should *never* discard literal IP adresses
based on AI_ADDRCONFIG. At least not unless we check against
*family* together with *scope* but even then it's very doubtful
to go directly against user's input.

As for names like 'localhost', 'localhost4' and various names you can
use for node-local and link-local addresses, I'm strictly against trying
to enumerate them as you cannot guess all possible names. Would
you for example check the suffix ".local" and treat the result as a
link-local FQDN? But it may be a global address also.

I actually don't believe in any checks that work with the name and
not the address. When you have the address, you can check it against
several rules to guess (and usually know) the scope.

Revision history for this message
In , Psimerda (psimerda) wrote :

Created attachment 6647
a temporary fix to ignore the whole AI_ADDRCONFIG thing

Until this issue is resolved, I'm building GLIBC with this patch to avoid problems with node-local and link-local networking.

Revision history for this message
In , Psimerda (psimerda) wrote :

By the way, I just learned that the current behavior is not mandated by POSIX. Thanks to Jeff Law for valuable information he provided:

http://pubs.opengroup.org/onlinepubs/9699919799/

If the AI_ADDRCONFIG flag is specified, IPv4 addresses shall be returned only if an IPv4 address is configured on the local system, [IP6] [Option Start] and IPv6 addresses shall be returned only if an IPv6 address is configured on the local system. [Option End]

Jeff: It may also be the case that we need to involve the "Austin Group" if we need further clarification of the standard (link-local handling comes to mind).

My summary is:

Filtering of non-DNS addresses in getaddrinfo() has no real use
and it only causes problems. There's no reason to filter over the
mere existence of addresses. Filtering over global address existence
may only be desirable for global address resolution, which is DNS. But
that should be done by the DNS resolver that only asks for addresses
that make sense and only accepts addresses that it asks for.

Revision history for this message
In , Psimerda (psimerda) wrote :

Changed the comment. Currently, I'm using the following patch to fix my
node-local IPv4/IPv6 networking (link-local networking is currently not
broken in Fedora since they removed the patch to also disregard link-local
addresses). I'm proposing this patch as a temporary solution until this
s fixed properly with the following notes:

* It breaks POSIX1-2008 (which requires checking for any IPv4/IPv6 address)
* It breaks informational RFC 3493 (which requires the same but disregards the loopback interface)
* It ignores the older informational RFC 2553 (which requires the same as RFC 3494 but only for DNS lookups)

The POSIX1-2008, RFC 3493 and obsolete RFC 2553 are all effectively useless
but the obsolete one is closest to the truth. Whether a global IPv4 address,
a global IPv4 route, or a global IPv4 default gateway is the right sign of
global connectivity, is up to discussion.

I also changed the description of the bug to reflect the actual problem,
not one possible (maybe wrong, but at least POSIX-compliant) solution of that problem.

Revision history for this message
In , Psimerda (psimerda) wrote :
Revision history for this message
In , Psimerda (psimerda) wrote :
Revision history for this message
In , Tore Anderson (toreanderson) wrote :

Created attachment 6781
Move AI_ADDRCONFIG into gaih_inet()

Attached patch moves AI_ADDCONFIG processing into gaih_inet(). This improves things by making it not apply to literal IP addresses. However, AI_ADDCONFIG will still be able to suppress results from both address families for things coming out of /etc/hosts like for example "localhost". In order to prevent that I think we'd need to move AI_ADDRCONFIG into _nss_dns_gethostbyname*(), which is beyond my programmings skills to implement I'm afraid.

Revision history for this message
In , Psimerda (psimerda) wrote :

(In reply to comment #8)
> Created attachment 6781 [details]
> Move AI_ADDRCONFIG into gaih_inet()
>
> Attached patch moves AI_ADDCONFIG processing into gaih_inet(). This improves
> things by making it not apply to literal IP addresses. However, AI_ADDCONFIG
> will still be able to suppress results from both address families for things
> coming out of /etc/hosts like for example "localhost". In order to prevent that
> I think we'd need to move AI_ADDRCONFIG into _nss_dns_gethostbyname*(), which
> is beyond my programmings skills to implement I'm afraid.

I tested the patch and confirm that I don't see any regressions.

Revision history for this message
In , Phattanon (phattanon) wrote :

Can someone point out that this should also fixed this bug ?

http://sourceware.org/bugzilla/show_bug.cgi?id=14212

Revision history for this message
In , Psimerda (psimerda) wrote :

(In reply to comment #10)
> Can someone point out that this should also fixed this bug ?
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=14212

This bug report is *not* related to the hosts nss plugin and is rather easy to fix.

Revision history for this message
In , Neleai (neleai) wrote :

Did you send this patch to libc-alpha?

Revision history for this message
In , Psimerda (psimerda) wrote :

(In reply to Ondrej Bilka from comment #12)
> Did you send this patch to libc-alpha?

Nope. Maybe we could discuss the libc development process some time in Prague or at LinuxAlt Brno? I'm not very familiar with it.

Revision history for this message
In , Jackie-rosen (jackie-rosen) wrote :

*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.

Revision history for this message
Kevin Otte (nivex) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug. I suspect this is actually a libvirt bug. To make sure, could you check whether

kvm -spice addr=127.0.0.1,port=5900

works for you? At least in an empty network namespace on my host this seemed to work fine.

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Kevin Otte (nivex) wrote :

I don't have the 'kvm' command on this machine, but it definitely appears to be qemu related:

kjotte@athens:~$ qemu-system-x86_64 -spice addr=127.0.0.1,port=5900
(/usr/bin/qemu-system-x86_64:1785): Spice-Warning **: reds.c:2330:reds_init_socket: getaddrinfo(127.0.0.1,5900): Address family for hostname not supported
qemu-system-x86_64: failed to initialize spice server

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1492621] Re: Cannot start VMs without routable IPv4 address

Can you show the output of 'sudo ifconfig -a; sudo brctl show; sudo netstat -nr' ?

Does

nc -l -4 127.0.0.1 5555 & echo ab | nc -4 127.0.0.1 5555

work?

Revision history for this message
Kevin Otte (nivex) wrote :

Attaching typescript output of requested diagnostic commands.
As OVS rather than Linux bridge is in use, I have included potentially relevant ovs-* command output.

Revision history for this message
Kevin Otte (nivex) wrote :

Given that VNC does this as well, I traced the problem all the way back to getaddrinfo() behavior in libc. I believe this describes it: https://bugzilla.redhat.com/show_bug.cgi?id=721350

tl;dr: It ignores loopback when enumerating addresses per family. Workaround is to define the display as ::1, which works fine since there's a global v6 address on the box.

Close: notabug

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks, Kevin!

Changed in qemu (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
In , Metadings (metadings) wrote :

I have this problem too - with gethostname(). I'm going to make a new DNS service and built a DNS resolver, which must run on localhost.

If I'm disconnected from eth0 and wlan0 and the computer is offline, Firefox doesn't even try to resolve hostnames (on Linux, but also on Windows), whereas nslookup actually does resolve them. I'm using Firefox, but now I suspect this being a problem of libc; and I don't want (but need) to run some virtual interface which is always-up.

I do need this feature, because there maybe services running on localhost - so they are also available in offline mode.

Revision history for this message
In , Metadings (metadings) wrote :

Otherwise said, glibc and libresolver don't respect if you are running a `resolv.conf` with `nameserver 127.0.1.1`.

If one of `eth0`, `wlan0` are online "up", it's no problem for the resolver to get name `github.com` into `A 192.30.252.130`.

But if all of `eth0`, `wlan0` are offline "down", the resolver doesn't even try to ask `127.0.0.3` for to lookup `github.com`. It just assumes NX.

So to say, if I don't have a connection, you can't access `github.com`.

But what if I'm going to lookup for `my.service.local`, or `my.service.localhost`?

If I do not have ANY connection, there IS a connection to localhost 127/8!

So to say, you MUST remove the restriction, that a service running on localhost 127/8 can't be accessed, just because you (eth0, wlan0) are offline "down"; even if you (lo) should always be online "up".

Revision history for this message
adam (addyp95) wrote :

This is still a present bug. In its current defaults, the lack of a routable ipv4 address will still cause libvirt to fail starting any guests. While setting ::1 worked for Kevin, this doesn't fix the bug, just works around it.

Revision history for this message
Paul Gear (paulgear) wrote :

I agree with @addyp95 - this is a bug in how libvirt works in IPv6-only environments, and it is still present on focal. For posterity, the exact details of the workaround are to add the following line to /etc/libvirt/qemu.conf:

spice_listen = "::1"

Changed in qemu (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Wow a bug from the stone age, so old that I never saw it before :-/

I have to thank you for the bump to it, due to the current freeze for the coming 20.04 release I'm busy and also unlikely to b able to change it soon.

But I have taken a todo to revisit this in depth for 20.10

I mean https://bugzilla.redhat.com/show_bug.cgi?id=721350 sounded pretty final (and since then gets bumped and closed), but still I want to revisit it.

tags: added: qemu-20.10
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It is a pain to see how often other things beat the priority of this ...
But having a config-based (no rebuilds, no custom patches) workaround just always de-prioritizes it :-/

One day I'll get to it ... (he said and looked at his retirement in 30 years)

tags: added: qemu-21.04 qemu-21.10
removed: qemu-20.10
Changed in glibc:
importance: Unknown → Medium
status: Unknown → Confirmed
tags: added: qemu-22.04 server-todo
removed: qemu-21.10
Changed in qemu (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in glibc (Ubuntu):
status: New → Confirmed
tags: removed: server-todo
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.