Soft lockup with "block nbdX: Attempted send on closed socket" spam

Bug #1505564 reported by Junien F
54
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Dan Streetman
Trusty
Fix Released
Undecided
Unassigned
Vivid
Fix Released
Undecided
Unassigned
Wily
Fix Released
Undecided
Unassigned

Bug Description

Some of our nova compute hosts regularly freeze, sometimes for a few hours, with kern.log getting spammed with:

block nbdX: Attempted send on closed socket

and a few "CPU soft lockup" messages (see attached log). This clears up when the queue gets cleared, eg :

block nbdX: queue cleared

trusty hosts with kernel version 3.19.0-30-generic.
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 24 12:23 seq
 crw-rw---- 1 root audio 116, 33 Nov 24 12:23 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 14.04
IwConfig: Error: [Errno 2] No such file or directory
MachineType: HP ProLiant DL385 G7
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-36-generic root=UUID=13289ac9-8dc9-4feb-b6bd-ca7db66b21d6 ro console=tty0 console=ttyS1,38400 nosplash crashkernel=384M-:512M nox2apic intremap=off
ProcVersionSignature: Ubuntu 3.19.0-36.41~14.04.1hf00090138v20151122b1-generic 3.19.8-ckt9
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-36-generic N/A
 linux-backports-modules-3.19.0-36-generic N/A
 linux-firmware 1.127.18
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty uec-images
Uname: Linux 3.19.0-36-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 02/02/2014
dmi.bios.vendor: HP
dmi.bios.version: A18
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrA18:bd02/02/2014:svnHP:pnProLiantDL385G7:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL385 G7
dmi.sys.vendor: HP

CVE References

Revision history for this message
Junien F (axino) wrote : BootDmesg.txt

apport information

tags: added: apport-collected trusty uec-images
description: updated
Revision history for this message
Junien F (axino) wrote : CRDA.txt

apport information

Revision history for this message
Junien F (axino) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lspci.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lsusb.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcModules.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevDb.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevLog.txt

apport information

Revision history for this message
Junien F (axino) wrote : WifiSyslog.txt

apport information

Revision history for this message
Junien F (axino) wrote :

Second host now

tags: added: staging
description: updated
Revision history for this message
Junien F (axino) wrote : BootDmesg.txt

apport information

Revision history for this message
Junien F (axino) wrote : CRDA.txt

apport information

Revision history for this message
Junien F (axino) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lspci.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lsusb.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcModules.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevDb.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevLog.txt

apport information

Revision history for this message
Junien F (axino) wrote : WifiSyslog.txt

apport information

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Junien F (axino) wrote :

I think that this may be a duplicate of #1500739, the symptoms certainly look the same.

Changed in linux (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
Revision history for this message
Junien F (axino) wrote : BootDmesg.txt

apport information

description: updated
Revision history for this message
Junien F (axino) wrote : CRDA.txt

apport information

Revision history for this message
Junien F (axino) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lspci.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lsusb.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcModules.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevDb.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevLog.txt

apport information

Revision history for this message
Junien F (axino) wrote : WifiSyslog.txt

apport information

Revision history for this message
Junien F (axino) wrote :

This issue just hit us again, this time I sent an NMI to the server to get a dump. It's available at https://chinstrap.canonical.com/~axino/201510281259.crash.lp1505564.tar.xz

apport information post-reboot is available above.

We've been trying to see if the issue appeared somewhere in the 3.13 series, hence the 3.13.0-29-generic kernel version.

Thanks !

Revision history for this message
Junien F (axino) wrote :

I'm just now realizing that the crashdump above may have been taken too late (when the kernel wasn't locked up anymre), because I could ssh to the server when I took it.

I was seeing the "block nbdX: Attempted send on closed socket" kernel log spam on the serial when I sent the NMI, but _perhaps_ these messages were just earlier messages that the serial was still catching up with.

Anyway, I got 2 new dumps, and these 2 were triggered automaticallt by kernel.softlockup_panic so they might be better.

Revision history for this message
Junien F (axino) wrote :

First dump + apport (post reboot) below

description: updated
Revision history for this message
Junien F (axino) wrote : BootDmesg.txt

apport information

Junien F (axino)
description: updated
Junien F (axino)
description: updated
Junien F (axino)
description: updated
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Junien F (axino)
description: updated
tags: added: kernel-key
Changed in linux (Ubuntu):
importance: Undecided → High
description: updated
tags: added: patch
82 comments hidden view all 162 comments
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Note to self:

The commit being backported to 3.19 and 3.13 has to contain this race fix:

commit dcc909d90ccdbb73226397ff6d298f7af35b0e11
Author: Markus Pargmann <email address hidden>
Date: Tue Oct 6 20:03:54 2015 +0200

    nbd: Add locking for tasks

    The timeout handling introduced in
        7e2893a16d3e (nbd: Fix timeout detection)
    introduces a race condition which may lead to killing of tasks that are
    not in nbd context anymore. This was not observed or reproducable yet.

    This patch adds locking to critical use of task_recv and task_send to
    avoid killing tasks that already left the NBD thread functions. This
    lock is only acquired if a timeout occures or the nbd device
    starts/stops.

    Reported-by: Ben Hutchings <email address hidden>
    Signed-off-by: Markus Pargmann <email address hidden>
    Reviewed-by: Ben Hutchings <email address hidden>
    Fixes: 7e2893a16d3e ("nbd: Fix timeout detection")
    Signed-off-by: Jens Axboe <email address hidden>

Also.

Revision history for this message
Junien F (axino) wrote :

apport information

description: updated
Revision history for this message
Junien F (axino) wrote : CRDA.txt

apport information

Revision history for this message
Junien F (axino) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lspci.txt

apport information

Revision history for this message
Junien F (axino) wrote : Lsusb.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Junien F (axino) wrote : ProcModules.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevDb.txt

apport information

Revision history for this message
Junien F (axino) wrote : UdevLog.txt

apport information

Revision history for this message
Junien F (axino) wrote : WifiSyslog.txt

apport information

Revision history for this message
Junien F (axino) wrote :

Unfortunately, one server managed to crashdump, even with your patched kernel. apport is above, crashdump is at https://private-fileshare.canonical.com/~axino/201511241217.crash.lp1505564.matar.tar.xz - sha1sum 056fae2554e52989a24094945b297c0c5906be7c

I've diverted qemu-nbd again.

Please let me know the next steps.

Thanks !

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Junien,

Sorry for the delay. After sometime dealing with some other priorities, I'm coming back to this. I'm downloading the dump and will take a look. Lets see what this bug is related with.

Tks for providing it. Will report something back soon.

Revision history for this message
Dan Streetman (ddstreet) wrote :

I've dl'ed the dump and I'm reviewing it.

Revision history for this message
Dan Streetman (ddstreet) wrote :

Ok, here's my analysis of the latest dump.

There are 3 kernel migrate threads waiting; this is the cause of the softlockup - specifically pid 101 on cpu 13 is where the softlockup (and then panic, due to panic on softlockup enabled) happens, and the other 2 migrate threads (pid 79 and 151) are also waiting. All are waiting for multi_cpu_stop to finish. The way multi_cpu_stop works is: the caller sets up one or more cpus to coordinate stopping; in multi_cpu_stop, the state machine moves from MULTI_STOP_PREPARE through disable irqs, to run (the provided function), to exit when done. However, only the specified cpus (in the cpumask) will run the function. The state machine doesn't proceed to the next step until all cpus have processed the current state.

This is where the problem comes in. In this case, it's a migration of tasks from one numa node to another, via numa rebalancing. In this particular case, there are 3 rebalancing events happening: cpu 3 and cpu 10, cpu 3 and cpu 13, cpu 3 and cpu 20. the migrate threads on cpus 10, 13, and 20 are running multi_cpu_stop, but it's stuck waiting because cpu 3 still has it in its queue.

cpu 3 is writing bytes to the serial port, and currently waiting for confirmation that the serial port write completed. This wait is done via checking the serial port register for CTS, then if it's not set delaying for 1us, and trying again. However, this is all inside a held spinlock, with irqs disabled. So while this serial port r/w is being done, nothing else will run on this cpu. But - the code limits this to 1 second, so presumably it shouldn't lock up the cpu for longer than 1 second or so (I haven't dug too far into this, so the function may be called multiple times with the lock held).

For whatever reason, that serial port r/w seems to be taking a long time. The migrate threads on the other cpus are waiting for it to finish, so that the migrate thread on cpu 3 can run, and move the multi_cpu_stop state machine along. But that doesn't happen in time to avoid the softlockup detector.

The multi_cpu_stop function could arguably use the addition of touch_nmi_watchdog(), since it intentionally spins on the cpu with interrupts disabled - doing so would avoid the softlockup detector (but would not change the system behavior). However, it's not really its fault, since the real cause is the other cpu(s) it's waiting for being locked.

back on cpu 3 (that the others are waiting on), the way that delay is implemented is using the TSC. Unfortunately, the TSC is a generally unreliable clock source, so it's possible there is a problem in the delay function.

To determine that, can you please boot with the "notsc" parameter, which will change the udelay function to use a simple loop instead of the TSC, and reproduce the softlockup?

Changed in linux (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → Dan Streetman (ddstreet)
Revision history for this message
Junien F (axino) wrote :

Hi Dan,

Thanks for your investigation. Sorry for the delay, but finally I managed to reboot the compute nodes with the "notsc" kernel parameter. I also disabled the qemu-nbd workaround.

Once that was done, it didn't take long for a node to crash, which would indicate that notsc didn't fix the problem. However, the host got stuck and didn't dump anything. OK then. It happened a second time a few minutes after on a different host, so I thought I'd investigate this more.

It turns out, the kernel booted through kexec fails booting probably because of the notsc option : https://pastebin.canonical.com/146714/

I'm a bit worried about the following line :
[ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely

which is also displayed during "regular" boots (eg not through kexec).

I guess I can remove "notsc" from the kexec command line, but this will take additional time. I thought I'd let you know the current status in the meantime.

Cheers

Revision history for this message
Dan Streetman (ddstreet) wrote :

> It turns out, the kernel booted through kexec fails booting probably because of the notsc option :
> https://pastebin.canonical.com/146714/

hmm, that's weird, but if notsc is all that changed i assume it is the problem.

> I'm a bit worried about the following line :
> [ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely

that's normal with notsc, the tsc is still there, it's just not used for the udelay function. but if it doesn't help the problem, no need to keep it.

> I guess I can remove "notsc" from the kexec command line, but this will take additional time.
> I thought I'd let you know the current status in the meantime.

ok thanks. I'll be out next week for the holidays, but continue looking at this Jan 1.

Revision history for this message
Junien F (axino) wrote :

Re-reading comment #318 Dan, I realize that we may be investigating a symptom and not the root cause.

Whenever the soft-lockup happens, the serial console does get flooded "block nbdX: Attempted send on closed socket". If the serial console getting flooded causes soft lockups, then it is indeed a concerning issue, but shouldn't we focus, in this bug, on making nbd not flood the console in the first place ?

Revision history for this message
Dan Streetman (ddstreet) wrote :

Well, yes I agree, it does look like the serial port causing the softlockup is probably separate - but caused by - the nbd closed socket errors. However, the serial port output definitely shouldn't be causing a softlockup - no matter how much data it has to send, the serial port driver in the kernel should be scheduling itself during operation, so that it doesn't hog a single cpu for a long time. It's more likely that the general system "freezing" you are seeing is due to the serial port driver refusing to schedule off its cpu, and not any problem with the nbdX failure.

I'll look into the nbd code also though, to see where that error is coming from and what that problem may be.

Revision history for this message
Nick Moffitt (nick-moffitt) wrote :

This problem has caused more serious damage recently. When nbd dies and printk()s like mad, the serial console is not fast enough to display it.

The kernel keeps allocating buffer space for serial output, which we see as 13G kmalloc-256 or kmalloc-512 kernel threads.

Eventually the OOMkiller tries to free up space, but it can only kill userspace programs so ultimately the system dies altogether.

This is more dire than mere CPU load or lockup warning messages.

Revision history for this message
Nick Moffitt (nick-moffitt) wrote :

This memory leak we have so far only seen on arm64, to be clear.

Revision history for this message
Dan Streetman (ddstreet) wrote :

axino or nick, can either of you attach an sosreport from an affected system? The crashdump doesn't include any userspace data so I can't see what exactly the qemu-nbd userspace program is doing, nor can i see what params it's started with. I'll need that info to be able to debug the qemu-nbd side of this.

Revision history for this message
Dan Streetman (ddstreet) wrote :

Ok, nm about the sosreport - I got the info from some older emails from axino, nova is using qemu-nbd to locally mount images and access the partitions inside them. I was able to trivially reproduce this simply by creating an image, attaching it with qemu-nbd to /dev/nbd0, partitioning it and mkfs its p1 and then mounting it, then while copying a file to it, performing qemu-nbd -d to un-attach it to /dev/nbd0. That causes the spam of "Attempted..." error messages.

So this appears to be a simple case of nova calling qemu-nbd -d while there is still I/O to the image. The right thing to do is simply ratelimit the error messages (and they really should be anyway, as they're printing directly inside a loop). The messages themselves do not indicate any kernel error, simply that the nbd device was removed while being written to.

Can you try this kernel PPA to see if it fixes the problem? You will still see the error messages, but only a few lines since they'll be ratelimited.

Of course there is still the (probably more serious) problem of the serial port driver hanging a cpu and eating up memory; that probably deserves its own bug, since it's caused by this, but a separate issue.

Revision history for this message
Junien F (axino) wrote :

Except that what happens on the compute nodes is that, when creating an instance, nova attaches the image with qemu-nbd (say to /dev/nbd0), and then tries to mount /dev/nbd0 somewhere, except that doesn't work because the image has partitions, and so the root device is actually on /dev/nbd0p1. So the "mount" commands return an error, and nova then detaches the image with qemu-nbd -d.

Overall, as far as nova logs show, there is 0 write on the nbd device and very few reads (probably just the MBR ?). Could that still cause inflight I/O when qemu-nbd -d is ran ?

I'll happily test your kernel PPA, but as far as I can see, you don't mention where it actually is :)

Thanks !

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Overall, as far as nova logs show, there is 0 write on the nbd device and very few reads (probably just the MBR ?).
> Could that still cause inflight I/O when qemu-nbd -d is ran ?

"very few" > 0
:-)

and it could be coming from elsewhere...but we don't need to account for where the IO is coming from, as the simple fact that it's there is enough. Also it's not just data IO, it's any "request", including metadata/control requests. Network-backed devices can disappear at any time, and the driver must be able to handle that. Spamming endless messages to the log isn't a good idea in that case.

To clarify the exact code in this situation:

while ((req = blk_fetch_request(q)) != NULL) {
...
  if (unlikely(!nbd->sock)) {
                        dev_err(disk_to_dev(nbd->disk), "Attempted send on closed socket\n");
...
                        continue;
         }

so, as soon as the connection (socket) is gone, there will be an "Attempted..." message printed for every request in the queue, as the queue is cleared.

> I'll happily test your kernel PPA, but as far as I can see, you don't mention where it actually is :)

ha, forgot to paste it in, sorry :-)

https://launchpad.net/~ddstreet/+archive/ubuntu/lp1505564

Revision history for this message
Junien F (axino) wrote :

I applied the patch, and it saved a reboot twice already, I think. dmesg from one server : http://pastebin.ubuntu.com/14438525/

I have to stop the tests for the weekend though, I'll resume on Monday.

Revision history for this message
Junien F (axino) wrote :

I resumed the tests on Monday, and so far we're looking good. Your change prevented ~10 locks so far, it would seem.

Revision history for this message
Dan Streetman (ddstreet) wrote :

Great. I'll send the patch upstream, and open a new bug for the serial port hanging issue. Thanks!

Alvaro Uria (aluria)
tags: added: canonical-bootstack
Revision history for this message
Dan Streetman (ddstreet) wrote :

opened bug 1534216 to track the serial port issue.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package linux - 4.4.0-6.21

---------------
linux (4.4.0-6.21) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1546283

  * Naples/Zen, NTB Driver (LP: #1542071)
    - [Config] CONFIG_NTB_AMD=m
    - NTB: Add support for AMD PCI-Express Non-Transparent Bridge

  * [Hyper-V] kernel panic occurs when installing Ubuntu Server x32 (LP: #1495983)
    - SAUCE: storvsc: use small sg_tablesize on x86

  * Enable arm64 emulation of removed ARMv7 instructions (LP: #1545542)
    - [Config] CONFIG_ARMV8_DEPRECATED=y

  * Surelock-GA2:kernel panic/ exception @ pcibios_set_pcie_reset_state+0x118/0x280 + cxl_reset+0x5c/0xc0 (LP: #1545037)
    - powerpc/eeh: Fix stale cached primary bus

  * Miscellaneous Ubuntu changes
    - SAUCE: fs: Add user namesapace member to struct super_block
    - SAUCE: fs: Limit file caps to the user namespace of the super block
    - SAUCE: Smack: Add support for unprivileged mounts from user namespaces
    - SAUCE: block_dev: Support checking inode permissions in lookup_bdev()
    - SAUCE: block_dev: Check permissions towards block device inode when mounting
    - SAUCE: fs: Treat foreign mounts as nosuid
    - SAUCE: selinux: Add support for unprivileged mounts from user namespaces
    - SAUCE: userns: Replace in_userns with current_in_userns
    - SAUCE: Smack: Handle labels consistently in untrusted mounts
    - SAUCE: fs: Check for invalid i_uid in may_follow_link()
    - SAUCE: cred: Reject inodes with invalid ids in set_create_file_as()
    - SAUCE: fs: Refuse uid/gid changes which don't map into s_user_ns
    - SAUCE: fs: Update posix_acl support to handle user namespace mounts
    - SAUCE: fs: Ensure the mounter of a filesystem is privileged towards its inodes
    - SAUCE: fs: Don't remove suid for CAP_FSETID in s_user_ns
    - SAUCE: fs: Allow superblock owner to access do_remount_sb()
    - SAUCE: capabilities: Allow privileged user in s_user_ns to set security.* xattrs
    - SAUCE: fuse: Add support for pid namespaces
    - SAUCE: fuse: Support fuse filesystems outside of init_user_ns
    - SAUCE: fuse: Restrict allow_other to the superblock's namespace or a descendant
    - SAUCE: fuse: Allow user namespace mounts
    - SAUCE: mtd: Check permissions towards mtd block device inode when mounting
    - SAUCE: fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
    - SAUCE: quota: Convert ids relative to s_user_ns
    - SAUCE: evm: Translate user/group ids relative to s_user_ns when computing HMAC
    - SAUCE: fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
    - SAUCE: quota: Treat superblock owner as privilged
    - SAUCE: ima/evm: Allow root in s_user_ns to set xattrs
    - SAUCE: block_dev: Forbid unprivileged mounting when device is opened for writing
    - SAUCE: ext4: Add support for unprivileged mounts from user namespaces
    - SAUCE: ext4: Add module parameter to enable user namespace mounts
    - SAUCE: fuse: Add module parameter to enable user namespace mounts

  * Miscellaneous upstream changes
    - megaraid: Fix possible NULL pointer deference in mraid_mm_ioctl
    - libahci: Implement the capability to override th...

Read more...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Brad Figg (brad-figg)
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Brad Figg (brad-figg)
Changed in linux (Ubuntu Vivid):
status: New → Fix Committed
Changed in linux (Ubuntu Wily):
status: New → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
tags: added: verification-needed-vivid
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-wily
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Dan Streetman (ddstreet)
tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Dan Streetman (ddstreet) wrote :

verification can be done with this script:

#!/bin/bash

modprobe nbd
qemu-nbd -d /dev/nbd0

truncate /tmp/testfile -s 20G
qemu-nbd -c /dev/nbd0 /tmp/testfile

for n in $( seq 1 250 ) ; do
  echo $n
  ( dd if=/dev/zero of=/dev/nbd0 bs=1 & )
done

qemu-nbd -d /dev/nbd0

after running that, on an unpatched system the dmesg will show a large number (~100 or more) of messages like:
[ 70.408246] block nbd0: Attempted send on closed socket

with a patched kernel, the dmesg will show a ratelimited number (~10) of those messages.

This has been verified on trusty 3.13, vivid 3.19, and wily 4.2

tags: added: verification-done-vivid verification-done-wily
removed: verification-needed-vivid verification-needed-wily
Revision history for this message
Junien F (axino) wrote :

Thanks !

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.7 KiB)

This bug was fixed in the package linux - 4.2.0-34.39

---------------
linux (4.2.0-34.39) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1555821

  [ Florian Westphal ]

  * SAUCE: [nf] netfilter: x_tables: check for size overflow
    - LP: #1555353
  * SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
    userspace
    - LP: #1555338

linux (4.2.0-33.38) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1554649

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon: call hpd_irq_event on resume"
    - LP: #1554608
  * cxl: Fix PSL timebase synchronization detection
    - LP: #1532914

linux (4.2.0-32.37) wily; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1550045

  [ Kamal Mostafa ]

  * Merged back Ubuntu-4.2.0-31.36

linux (4.2.0-31.36) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1548579

  [ Andy Whitcroft ]

  * [Debian] hv: hv_set_ifconfig -- convert to python3
    - LP: #1506521
  * [Debian] hv: hv_set_ifconfig -- switch to approved indentation
    - LP: #1540586
  * [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues
    - LP: #1540586

  [ Carol L Soto ]

  * SAUCE: IB/IPoIB: Do not set skb truesize since using one linearskb
    - LP: #1541326

  [ Dan Streetman ]

  * SAUCE: nbd: ratelimit error msgs after socket close
    - LP: #1505564

  [ Tim Gardner ]

  * Revert "SAUCE: (noup) cxlflash: Fix to avoid virtual LUN failover
    failure"
    - LP: #1541635
  * Revert "SAUCE: (noup) cxlflash: Fix to escalate LINK_RESET also on port
    1"
    - LP: #1541635
  * [Config] ARMV8_DEPRECATED=y
    - LP: #1545542

  [ Upstream Kernel Changes ]

  * x86/xen/p2m: hint at the last populated P2M entry
    - LP: #1542941
  * mm: add dma_pool_zalloc() call to DMA API
    - LP: #1543737
  * sctp: Prevent soft lockup when sctp_accept() is called during a timeout
    event
    - LP: #1543737
  * xen-netback: respect user provided max_queues
    - LP: #1543737
  * xen-netfront: respect user provided max_queues
    - LP: #1543737
  * xen-netfront: update num_queues to real created
    - LP: #1543737
  * iio: adis_buffer: Fix out-of-bounds memory access
    - LP: #1543737
  * KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
    - LP: #1543737
  * KVM: PPC: Fix ONE_REG AltiVec support
    - LP: #1543737
  * x86/irq: Call chip->irq_set_affinity in proper context
    - LP: #1543737
  * drm/amdgpu: fix tonga smu resume
    - LP: #1543737
  * perf kvm record/report: 'unprocessable sample' error while
    recording/reporting guest data
    - LP: #1543737
  * hrtimer: Handle remaining time proper for TIME_LOW_RES
    - LP: #1543737
  * timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
    - LP: #1543737
  * posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
    - LP: #1543737
  * itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
    - LP: #1543737
  * drm/amdgpu: Use drm_calloc_large for VM page_tables array
    - LP: #1543737
  * drm/amdgpu: fix amdgpu_bo_pin_restricted VRAM placing v2
    - LP: #1543737
  * drm/radeon: properly byte swap vce firmware setup
    - LP: #1543737
  ...

Read more...

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.3 KiB)

This bug was fixed in the package linux - 3.19.0-56.62

---------------
linux (3.19.0-56.62) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1555832

  [ Florian Westphal ]

  * SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
    userspace
    - LP: #1555338

linux (3.19.0-55.61) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1554708

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon: call hpd_irq_event on resume"
    - LP: #1554608

linux (3.19.0-54.60) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1552337

  [ Upstream Kernel Changes ]

  * Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
    - LP: #1551419

linux (3.19.0-53.59) vivid; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1550576

  [ Kamal Mostafa ]

  * Merged back 3.19.0-52.58

linux (3.19.0-52.58) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1548548

  [ Dan Streetman ]

  * SAUCE: nbd: ratelimit error msgs after socket close
    - LP: #1505564

  [ Upstream Kernel Changes ]

  * Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()"
    - LP: #1542457
  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1546320
  * net: ipmr: fix static mfc/dev leaks on table destruction
    - LP: #1542457
  * drm/nouveau/nv46: Change mc subdev oclass from nv44 to nv4c
    - LP: #1542457
  * ovl: allow zero size xattr
    - LP: #1542457
  * ovl: use a minimal buffer in ovl_copy_xattr
    - LP: #1542457
  * [media] vb2: fix a regression in poll() behavior for output,streams
    - LP: #1542457
  * [media] gspca: ov534/topro: prevent a division by 0
    - LP: #1542457
  * [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
    - LP: #1542457
  * tools lib traceevent: Fix output of %llu for 64 bit values read on 32
    bit machines
    - LP: #1542457
  * KVM: x86: expose MSR_TSC_AUX to userspace
    - LP: #1542457
  * KVM: x86: correctly print #AC in traces
    - LP: #1542457
  * drm/radeon: call hpd_irq_event on resume
    - LP: #1542457
  * xhci: refuse loading if nousb is used
    - LP: #1542457
  * arm64: Clear out any singlestep state on a ptrace detach operation
    - LP: #1542457
  * time: Avoid signed overflow in timekeeping_get_ns()
    - LP: #1542457
  * ovl: root: copy attr
    - LP: #1542457
  * Bluetooth: Add support of Toshiba Broadcom based devices
    - LP: #1522949, #1542457
  * rtlwifi: fix memory leak for USB device
    - LP: #1542457
  * wlcore/wl12xx: spi: fix oops on firmware load
    - LP: #1542457
  * ovl: check dentry positiveness in ovl_cleanup_whiteouts()
    - LP: #1542457
  * EDAC, mc_sysfs: Fix freeing bus' name
    - LP: #1542457
  * EDAC: Robustify workqueues destruction
    - LP: #1542457
  * arm64: mm: ensure that the zero page is visible to the page table
    walker
    - LP: #1542457
  * powerpc: Make value-returning atomics fully ordered
    - LP: #1542457
  * powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
    - LP: #1542457
  * dm space map metadata: remove unused variable in brb_pop()
    - LP: #1542457
  * dm thi...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 3.13.0-83.127

---------------
linux (3.13.0-83.127) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1555839

  [ Florian Westphal ]

  * SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
    userspace
    - LP: #1555338

linux (3.13.0-82.126) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1554732

  [ Upstream Kernel Changes ]

  * Revert "drm/radeon: call hpd_irq_event on resume"
    - LP: #1554608
  * net: generic dev_disable_lro() stacked device handling
    - LP: #1547680

linux (3.13.0-81.125) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1552316

  [ Upstream Kernel Changes ]

  * Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
    - LP: #1551419
  * bcache: Fix a lockdep splat in an error path
    - LP: #1551327

linux (3.13.0-80.124) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1548519

  [ Andy Whitcroft ]

  * [Debian] hv: hv_set_ifconfig -- convert to python3
    - LP: #1506521
  * [Debian] hv: hv_set_ifconfig -- switch to approved indentation
    - LP: #1540586
  * [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues
    - LP: #1540586

  [ Dan Streetman ]

  * SAUCE: nbd: ratelimit error msgs after socket close
    - LP: #1505564

  [ Upstream Kernel Changes ]

  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1546320
  * [media] gspca: ov534/topro: prevent a division by 0
    - LP: #1542497
  * [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
    - LP: #1542497
  * tools lib traceevent: Fix output of %llu for 64 bit values read on 32
    bit machines
    - LP: #1542497
  * KVM: x86: correctly print #AC in traces
    - LP: #1542497
  * drm/radeon: call hpd_irq_event on resume
    - LP: #1542497
  * xhci: refuse loading if nousb is used
    - LP: #1542497
  * arm64: Clear out any singlestep state on a ptrace detach operation
    - LP: #1542497
  * time: Avoid signed overflow in timekeeping_get_ns()
    - LP: #1542497
  * rtlwifi: fix memory leak for USB device
    - LP: #1542497
  * wlcore/wl12xx: spi: fix oops on firmware load
    - LP: #1542497
  * EDAC, mc_sysfs: Fix freeing bus' name
    - LP: #1542497
  * EDAC: Don't try to cancel workqueue when it's never setup
    - LP: #1542497
  * EDAC: Robustify workqueues destruction
    - LP: #1542497
  * powerpc: Make value-returning atomics fully ordered
    - LP: #1542497
  * powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
    - LP: #1542497
  * dm space map metadata: remove unused variable in brb_pop()
    - LP: #1542497
  * dm thin: fix race condition when destroying thin pool workqueue
    - LP: #1542497
  * futex: Drop refcount if requeue_pi() acquired the rtmutex
    - LP: #1542497
  * drm/radeon: clean up fujitsu quirks
    - LP: #1542497
  * mmc: sdio: Fix invalid vdd in voltage switch power cycle
    - LP: #1542497
  * mmc: sdhci: Fix sdhci_runtime_pm_bus_on/off()
    - LP: #1542497
  * udf: limit the maximum number of indirect extents in a row
    - LP: #1542497
  * nfs: Fix race in __update_open_stateid...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Paul Gear (paulgear) wrote :

For posterity: If https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1505564/comments/143 is the cause of this issue for you, dmesg -D (which turns off console logging of kernel messages) might be a viable workaround until you can reboot.

Displaying first 40 and last 40 comments. View all 162 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.