liburcu: Enable MEMBARRIER_CMD_PRIVATE_EXPEDITED to address performance problems with MEMBARRIER_CMD_SHARED

Bug #1876230 reported by Matthew Ruffell
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
liburcu (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Matthew Ruffell

Bug Description

[Impact]

In Linux 4.3, a new syscall was defined, called "membarrier". This systemcall was defined specifically for use in userspace-rcu (liburcu) to speed up the fast path / reader side of the library. The original implementation in Linux 4.3 only supported the MEMBARRIER_CMD_SHARED subcommand of the membarrier syscall.

MEMBARRIER_CMD_SHARED executes a memory barrier on all threads from all processes running on the system. When it exits, the userspace thread which called it is guaranteed that all running threads share the same world view in regards to userspace addresses which are consumed by readers and writers.

The problem with MEMBARRIER_CMD_SHARED is system calls made in this fashion can block, since it deploys a barrier across all threads in a system, and some other threads can be waiting on blocking operations, and take time to reach the barrier.

In Linux 4.14, this was addressed by adding the MEMBARRIER_CMD_PRIVATE_EXPEDITED command to the membarrier syscall. It only targets threads which share the same mm as the thread calling the membarrier syscall, aka, threads in the current process, and not all threads / processes in the system.

Calls to membarrier with the MEMBARRIER_CMD_PRIVATE_EXPEDITED command are guaranteed non-blocking, due to using inter-processor interrupts to implement memory barriers.

Because of this, membarrier calls that use MEMBARRIER_CMD_PRIVATE_EXPEDITED are much faster than those that use MEMBARRIER_CMD_SHARED.

Since Bionic uses a 4.15 kernel, all kernel requirements are met, and this SRU is to enable support for MEMBARRIER_CMD_PRIVATE_EXPEDITED in the liburcu package.

This brings the performance of the liburcu library back in line to where it was in Trusty, as this particular user has performance problems upon upgrading from Trusty to Bionic.

[Test]

Testing performance is heavily dependant on the application which links against liburcu, and the workload which it executes.

A test package is available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf276198-test

For the sake of testing, we can use the benchmarks provided in the liburcu source code. Download a copy of the source code for liburcu either from the repos or from github:

$ pull-lp-source liburcu bionic
# OR
$ git clone https://github.com/urcu/userspace-rcu.git
$ git checkout v0.10.1 # version in bionic

Build the code:

$ ./bootstrap
$ ./configure
$ make

Go into the tests/benchmark directory

$ cd tests/benchmark

From there, you can run benchmarks for the four main usages of liburcu: urcu, urcu-bp, urcu-signal and urcu-mb.

On a 8 core machine, 6 threads for readers and 2 threads for writers, with a 10 second runtime, execute:

$ ./test_urcu 6 2 10
$ ./test_urcu_bp 6 2 10
$ ./test_urcu_signal 6 2 10
$ ./test_urcu_mb 6 2 10

Results:

./test_urcu 6 2 10
0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops

$ ./test_urcu_bp 6 2 10
0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops

$ ./test_urcu_signal 6 2 10
0.10.1-1: 20128392417 reads, 6859 writes, 20128399276 ops
0.10.1-1ubuntu1: 20501430707 reads, 6890 writes, 20501437597 ops

$ ./test_urcu_mb 6 2 10
0.10.1-1: 627996563 reads, 5409563 writes, 633406126 ops
0.10.1-1ubuntu1: 653194752 reads, 4590020 writes, 657784772 ops

The SRU only changes behaviour for urcu and urcu-bp, since they are the only "flavours" of liburcu which the patches change. From a pure ops standpoint:

$ ./test_urcu 6 2 10
17612527935 ops
14989247316 ops

$ ./test_urcu_bp 6 2 10
1179590602 ops
13230930051 ops

We see that this particular benchmark workload, test_urcu sees extra performance overhead with MEMBARRIER_CMD_PRIVATE_EXPEDITED, which is explained by the extra impact that it has on the slowpath, and the extra amount of writes it did during my benchmark.

The real winner in this benchmark workload is test_urcu_bp, which sees a 10x performance increase with MEMBARRIER_CMD_PRIVATE_EXPEDITED. Some of this may be down to the 3x less writes it did during my benchmark.

Again, these benchmarks are indicative only are very "random". Performance is really dependant on the application which links against liburcu and its workload.

[Regression Potential]

This SRU changes the behaviour of the following libraries which applications link against: -lurcu and -lurcu-bp. Behaviour is not changed in the rest: -lurcu-qsbr, -lucru-signal and -lucru-mb.

On Bionic, liburcu will call the membarrier syscall in urcu and urcu-bp. This does not change. What is changing is the semantics of that syscall, from MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_PRIVATE_EXPEDITED. The changed code is all run in kernel space and resides in the kernel. These commits simply change the parameters which are supplied to the membarrier syscall from liburcu.

I have run the testsuite that comes with the Bionic source code, and "make regtest", "make short_bench" and "make long_bench" pass. You want to run these on a cloud instance somewhere since they take multiple hours.

If a regression were to occur, applications linked against -lurcu and -lurcu-bp would be affected. The homepage: https://liburcu.org/ offers a list of the major applications that use liburcu: Knot DNS, Netsniff-ng, Sheepdog, GlusterFS, gdnsd and LTTng.

[Scope]

The two commits which are being SRU'd are:

commit c0bb9f693f926595a7cb8b4ce712cef08d9f5d49
Author: Mathieu Desnoyers <email address hidden>
Date: Thu Dec 21 13:42:23 2017 -0500
Subject: liburcu: Use membarrier private expedited when available
Link: https://github.com/urcu/userspace-rcu/commit/c0bb9f693f926595a7cb8b4ce712cef08d9f5d49

commit 3745305bf09e7825e75ee5b5490347ee67c6efdd
Author: Mathieu Desnoyers <email address hidden>
Date: Fri Dec 22 10:57:59 2017 -0500
Subject: liburcu-bp: Use membarrier private expedited when available
Link: https://github.com/urcu/userspace-rcu/commit/3745305bf09e7825e75ee5b5490347ee67c6efdd

Both cherry pick directly onto 0.10.1 in Bionic, and are originally from 0.11.0, meaning that Eoan, Focal and Groovy already have the patch.

[Other]

If you are interested in how the membarrier syscall works, you can read their commits in the Linux kernel:

commit 5b25b13ab08f616efd566347d809b4ece54570d1
Author: Mathieu Desnoyers <email address hidden>
Date: Fri Sep 11 13:07:39 2015 -0700
Subject: sys_membarrier(): system-wide memory barrier (generic, x86)
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b25b13ab08f616efd566347d809b4ece54570d1

commit 22e4ebb975822833b083533035233d128b30e98f
Author: Mathieu Desnoyers <email address hidden>
Date: Fri Jul 28 16:40:40 2017 -0400
Subject: membarrier: Provide expedited private command
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22e4ebb975822833b083533035233d128b30e98f

Additionally, blog posts from LTTng:
https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/

And Phoronix:
https://www.phoronix.com/scan.php?page=news_item&px=URCU-Membarrier-Performance

Changed in liburcu (Ubuntu):
status: New → Fix Released
Changed in liburcu (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
tags: added: sts
description: updated
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Attached is a debdiff for Bionic

Dan Streetman (ddstreet)
tags: added: sts-sponsor-ddstreet
Dan Streetman (ddstreet)
description: updated
Revision history for this message
Dan Streetman (ddstreet) wrote :

Hi @mruffell,

two questions for this sru:

1) it looks like static libs are built/provided by this package:

$ pull-lp-debs liburcu bionic ; for p in *.deb ; do echo "$p:" ; dpkg-deb -c $p | grep -E '*\.a' ; done
Found liburcu 0.10.1-1 in bionic
Using existing file liburcu-dev_0.10.1-1_amd64.deb
Using existing file liburcu6_0.10.1-1_amd64.deb
liburcu6_0.10.1-1_amd64.deb:
liburcu-dev_0.10.1-1_amd64.deb:
-rw-r--r-- root/root 47956 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-bp.a
-rw-r--r-- root/root 69844 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-cds.a
-rw-r--r-- root/root 23912 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-common.a
-rw-r--r-- root/root 43750 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-mb.a
-rw-r--r-- root/root 45642 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-qsbr.a
-rw-r--r-- root/root 45716 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu-signal.a
-rw-r--r-- root/root 45148 2018-01-23 15:46 ./usr/lib/x86_64-linux-gnu/liburcu.a

and several other pkgs have build-dep for that:

$ reverse-depends -b -r bionic liburcu-dev
Reverse-Build-Depends
* gdnsd
* glusterfs
* knot
* ltt-control
* multipath-tools
* netsniff-ng
* sheepdog
* ust

Can you check those packages to see if any use static linking (and thus should be recompiled with the updated static liburcu libs)?

2) In your testing results comparison:

> ./test_urcu 6 2 10
> 0.10.1-1: 17612527667 reads, 268 writes, 17612527935 ops
> 0.10.1-1ubuntu1: 14988437247 reads, 810069 writes, 14989247316 ops

The number of writes is obviously much, much better; however the number of reads actually goes down with the patched code.

> $ ./test_urcu_bp 6 2 10
> 0.10.1-1: 1177891079 reads, 1699523 writes, 1179590602 ops
> 0.10.1-1ubuntu1: 13230354737 reads, 575314 writes, 13230930051 ops

Similarly, while the number of reads increases significantly, the number of writes goes down.

I may be misreading the results, but it seems like this change is not an across-the-board improvement, but more of a performance trade-off. If that's the case, I think it will be hard to make the case this should be included as an SRU. Can you clarify the results comparison in more detail please?

Revision history for this message
Matthew Ruffell (mruffell) wrote :
Download full text (3.6 KiB)

To answer question 1, I went and checked every rdepends package:

gdnsd: dynamically links to -lurcu-qsbr
$ ldd /usr/sbin/gdnsd
liburcu-qsbr.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-qsbr.so.6 (0x00007f33bfde5000)

glusterfs: The only package I am not entirely sure about. Only glusterd uses urcu:
ubuntu@ubuntu:~/glusterfs-3.13.2$ grep -Rin "<urcu" .
./config.h.in:200:/* Define to 1 if you have the <urcu-bp.h> header file. */
./config.h.in:203:/* Define to 1 if you have the <urcu/cds.h> header file. */
./xlators/mgmt/glusterd/src/glusterd-rcu.h:14:#include <urcu-bp.h>
./xlators/mgmt/glusterd/src/glusterd-rcu.h:15:#include <urcu/rculist.h>
./xlators/mgmt/glusterd/src/glusterd-rcu.h:16:#include <urcu/compiler.h>
./xlators/mgmt/glusterd/src/glusterd-rcu.h:17:#include <urcu/uatomic.h>
./xlators/mgmt/glusterd/src/glusterd-rcu.h:18:#include <urcu-call-rcu.h>
./xlators/mgmt/glusterd/src/glusterd-conn-helper.c:15:#include <urcu/rculist.h>
$ ldd /usr/sbin/glusterd | grep urcu
<blank>
No mention of static linking either:
ubuntu@ubuntu:~/glusterfs-3.13.2$ grep -Rin "\.a" . | grep urcu
<blank>
The library linker settings are in ./configure.ac:
dnl Check for userspace-rcu
PKG_CHECK_MODULES([URCU], [liburcu-bp], [],
  [AC_CHECK_HEADERS([urcu-bp.h],
     [URCU_LIBS='-lurcu-bp'],
     AC_MSG_ERROR([liburcu-bp not found]))])
PKG_CHECK_MODULES([URCU_CDS], [liburcu-cds >= 0.8], [],
  [PKG_CHECK_MODULES([URCU_CDS], [liburcu-cds >= 0.7],
    [AC_DEFINE(URCU_OLD, 1, [Define if liburcu 0.6 or 0.7 is found])],
    [AC_CHECK_HEADERS([urcu/cds.h],
      [AC_DEFINE(URCU_OLD, 1, [Define if liburcu 0.6 or 0.7 is found])
       URCU_CDS_LIBS='-lurcu-cds'],
      [AC_MSG_ERROR([liburcu-cds not found])])])])
I ran ldd over all glusterfs binaries which are listed by dpkg -L, but they all came back negative for urcu. From what I can tell, glusterd from the xlators directory is either not built, or does not link against urcu in Ubuntu.

knot: dynamically links to -lurcu
ubuntu@ubuntu:~$ ldd /usr/sbin/knotd
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x00007fc9d1e8b000)

lttng: dynamically links to various urcu libraries
$ ldd /usr/bin/lttng
liburcu-common.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-common.so.6 (0x00007f6711a2f000)
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x00007f6711827000)
liburcu-cds.so.6 => /usr/lib/x86_64-linux-gnu/liburcu-cds.so.6 (0x00007f671161d000)

multipath-tools: dynamically links to -lurcu
$ ldd /sbin/multipath
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x00007fe63ace1000)

netsniff-ng: dynamically links to -lurcu
~/netsniff-ng-0.6.4$ grep -Rin "urcu"
flowtop/Makefile:1:flowtop-libs = -lurcu \
ui.h:7:#include <urcu/list.h>
flowtop.c:28:#include <urcu.h>
flowtop.c:29:#include <urcu/list.h>
flowtop.c:30:#include <urcu/rculist.h>
INSTALL:28: - liburcu: flowtop
$ ldd /usr/sbin/flowtop
liburcu.so.6 => /usr/lib/x86_64-linux-gnu/liburcu.so.6 (0x00007f7561a5a000)

sheepdog: this program doesn't link against urcu at all! It only uses the <urcu/uatomic.h> header file and not anything more.
search for urcu in sheepdog: https://paste.ubuntu.com/p/VPKr4pWtQg/
explanation found in d/changelog: https://paste.ubu...

Read more...

Revision history for this message
Matthew Ruffell (mruffell) wrote :
Download full text (12.0 KiB)

Answering question 2. I have done a comprehensive performance analysis based on the benchmark application.

Note: The SRU changes how the sys_membarrier syscall is used. The implementation that we want to change to in this SRU never blocks, while the previous implementation does. This makes performance analysis entirely workload dependant. On busy servers with lots of background processes, sys_membarrier will block more often, compared to quiet servers with no background processes.

The following is based on a quiet server with no background processes.

Test parameters
===============
Ubuntu 18.04.4
KVM, 2 vcpus
0.10.1 liburcu
4.15.0-99-generic
Test program "test_urcu[_bp]": http://paste.ubuntu.com/p/5vXVycQjYk/
(only difference is #include <urcu.h> or #include <urcu-bp.h>)

No changes to source code
=========================

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10
nr_reads 6065490002 nr_writes 237 nr_ops 6065490239
nr_reads 6476219475 nr_writes 186 nr_ops 6476219661
nr_reads 6474789528 nr_writes 183 nr_ops 6474789711
nr_reads 6476326433 nr_writes 188 nr_ops 6476326621
nr_reads 6479298142 nr_writes 179 nr_ops 6479298321
nr_reads 6476429569 nr_writes 186 nr_ops 6476429755
nr_reads 6478019994 nr_writes 191 nr_ops 6478020185
nr_reads 6479117595 nr_writes 183 nr_ops 6479117778
nr_reads 6478302181 nr_writes 185 nr_ops 6478302366
nr_reads 6481003399 nr_writes 191 nr_ops 6481003590

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10
nr_reads 644339902 nr_writes 485 nr_ops 644340387
nr_reads 644092800 nr_writes 1101 nr_ops 644093901
nr_reads 644676446 nr_writes 494 nr_ops 644676940
nr_reads 643845915 nr_writes 500 nr_ops 643846415
nr_reads 645156053 nr_writes 502 nr_ops 645156555
nr_reads 644626421 nr_writes 497 nr_ops 644626918
nr_reads 644710679 nr_writes 495 nr_ops 644711174
nr_reads 644445530 nr_writes 503 nr_ops 644446033
nr_reads 645150707 nr_writes 497 nr_ops 645151204
nr_reads 643681268 nr_writes 496 nr_ops 643681764

Commits c0bb9f and 374530 patched in
====================================

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu 6 2 10
nr_reads 4097663510 nr_writes 6516 nr_ops 4097670026
nr_reads 4177088332 nr_writes 4183 nr_ops 4177092515
nr_reads 4153780077 nr_writes 1907 nr_ops 4153781984
nr_reads 4150954044 nr_writes 3942 nr_ops 4150957986
nr_reads 4267855073 nr_writes 2102 nr_ops 4267857175
nr_reads 4131310825 nr_writes 7119 nr_ops 4131317944
nr_reads 4183771431 nr_writes 1919 nr_ops 4183773350
nr_reads 4270944170 nr_writes 4958 nr_ops 4270949128
nr_reads 4123277225 nr_writes 4228 nr_ops 4123281453
nr_reads 4266997284 nr_writes 1723 nr_ops 4266999007

ubuntu@ubuntu:~/userspace-rcu/tests/benchmark$ ./test_urcu_bp 6 2 10
nr_reads 6530208343 nr_writes 8860 nr_ops 65302172...

Revision history for this message
Dan Streetman (ddstreet) wrote :

excellent analysis, thanks @mruffell!

uploaded to the bionic queue.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Sorry to jump in, I wrote the membarrier syscall LTP interface test (https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/membarrier/membarrier01.c) when I was at Linaro and felt attracted to reading all this during my bug triage day =).

So, for MEMBARRIER_CMD_PRIVATE_EXPEDITED, you said at the description:

" It only targets threads which share the same mm as the thread calling the membarrier syscall "

And it occurred to me right away:

"Are there any dependencies using liburcu and Shared Mem IPC" ?

Mainly because the private expedited only guarantees the barrier - for the RCU graceful period - among sibling tasks sharing the same address space and thread id (which share the same mm) -> of course this explains why it is much faster, specially in an OS containing tons of running tasks.. or even worse: in a VM with tons of running tasks PLUS vcpus coming in and out the real ones.

So, I think my message here, specially for the SRU team, as this is changing the behavior of librcu library for a LTS version, is:

- apart from feedback given by @mruffell, to also check if any of librcu consumers are depending on a full membarrier - driven by kernel - for ** shared pages among different processes **

before accepting this as a SRU. Or else we're changing synchronization primitives of all dependencies and this would not be good =o).

I think this criteria may also help in the SRU acceptance ^

Revision history for this message
Dan Streetman (ddstreet) wrote :

> - apart from feedback given by @mruffell, to also check if any of librcu consumers are depending on a full membarrier - driven by kernel - for ** shared pages among different processes **

this is a good point, although I don't think liburcu makes guarantees like that, for memory barriers outside of the current process; for example, the -qsbr, -md, and -signal flavors don't use sys_membarrier at all.

@mruffell have you looked into that aspect of the change?

Revision history for this message
Matthew Ruffell (mruffell) wrote :

> - apart from feedback given by @mruffell, to also check if any of librcu consumers are depending on a full membarrier - driven by kernel - for ** shared pages among different processes **

I agree with @ddstreet, I don't think liburcu gives that sort of guarantee when it comes to cross process synchronisation. It was my belief that liburcu targets synchronisation across a set of threads within the current process only.

Proof by contradiction.

Assume that a program compiles against liburcu and uses it to synchronise access to shared memory pages for IPC between a sister process.

If the program links against liburcu 0.9 or lower, the sys_membarrier syscall did not exist yet, and liburcu will use the default compiler based membarrier, which is only good within the current process. Synchronisation across shared memory pages fails. This is the case on Xenial, Trusty and the like.

If the program links against liburcu 0.11 or newer, the sys_membarrier syscall does exist, but MEMBARRIER_CMD_SHARED is only used if the current running kernel does not support MEMBARRIER_CMD_PRIVATE_EXPEDITED. There is no toggle option in the API at all, so for users with a kernel 4.14 or higher, MEMBARRIER_CMD_PRIVATE_EXPEDITED will be used, and synchronisation across shared memory pages will fail. This is the case on Eoan, Focal, Groovy.

If the program links against liburcu 0.10, and uses the -qsbr, -md and -signal variants, sys_membarrier is not used at all, and it falls back to the compiler based membarrier, which is only good within the current process. Synchronisation across shared memory pages will fail.

If the program links against liburcu 0.10, and is used within a container, with a kernel version less than 4.3 that does not support sys_membarrier, such as a Bionic container on a Trusty 3.13 host, or on a 3.10 RHEL host, the sys_membarrier syscall fails, and it falls back to the compiler based membarrier. Synchronisation across shared memory pages will fail.

Now, the upstream developers added MEMBARRIER_CMD_PRIVATE_EXPEDITED as the default in liburcu 0.11. They did not change the API to accommodate both MEMBARRIER_CMD_SHARED and MEMBARRIER_CMD_PRIVATE_EXPEDITED, and instead, if the kernel is greater than 4.14, MEMBARRIER_CMD_PRIVATE_EXPEDITED will be used. Upstream are well aware of their consumers, and they would not break everyone's usages out of the blue, without adding some sort of API provision for legacy users.

Thus, our initial assumption that liburcu can be used to synchronise access to shared memory pages for IPC between a sister process is wrong, since no one will create a program that potentially only works in one specific environment, which is bionic on bare metal and liburcu 0.10 only. I'm not even sure how you would co-ordinate liburcu over multiple processes either.

So, because of the above, I don't think any librcu consumers are depending on a full membarrier, driven by the kernel, for shared pages among different processes.

I still think this is safe to SRU.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

This is a very tricky SRU to be reviewing. Not having too much expertise in RCU, I need to rely on the opinion of people with more experience in this regard. Performance improvements like these are certainly SRUable, but we need to make sure that all the regression potential is covered and behavior for existing users stays relatively the same - at least without any ill-effects.

So Matthew, your analysis and rationale in #8 seems legit. The change also looks good, well documented and well contained. Also, the check for static-linking also seems to be reassuring. I think content-wise we're good to go here.

What I'd like before we proceed is for Rafael to take a look at your analysis in comment #8 and say if that addresses his concerns as well (since as I said, I'd like someone with more expertise to also +1). @rafaeldtinoco could you take a look?

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (6.0 KiB)

@sil2100,

thanks for the trust. My TL;DR version for you is:

From one of liburcu maintainers (https://github.com/compudj):

"""
Posted Nov 24, 2013 23:55 UTC (Sun) by compudj (subscriber, #43335) [Link]

Tracking threads running in multiple processes using a common shared memory is not possible with the currently implemented URCU flavors, but we look forward to adding a new URCU flavor to support this kind of use case.
"""

So that satisfies the corner case I have thought of.

By that, I'm +1 on the SRU.

----

continuing the longer version...

> I agree with @ddstreet, I don't think liburcu gives that sort of guarantee when it comes to cross process synchronisation. It was my belief that liburcu targets synchronisation across a set of threads within the current process only.

It does not (like stated above) but I had to check, specially cause I was just checking by the membarrier() syscall point of view (not too much into liburcu implementation).

NOW with all that I got curious =)...

From liburcu documentation:

"""
There are multiple flavors of liburcu available:

memb, qsbr, mb, signal, bp.

The API members start with the prefix "urcu__", where is the chosen flavor name.

Usage of liburcu-memb

#include <urcu/urcu-memb.h>

Link the application with -lurcu-memb

This is the preferred version of the library, in terms of grace-period detection speed, read-side speed and flexibility. Dynamically detects kernel support for sys_membarrier().

Falls back on urcu-mb scheme if support is not present, which has slower read-side. Use the --disable-sys-membarrier-fallback configure option to disable the fall back, thus requiring sys_membarrier() to be available. This gives a small speedup when sys_membarrier() is supported by the kernel, and aborts in the library constructor if not supported.

Usage of liburcu-qsbr

#include <urcu/urcu-qsbr.h>

Link with -lurcu-qsbr

The QSBR flavor of RCU needs to have each reader thread executing rcu_quiescent_state() periodically to progress. rcu_thread_online() and rcu_thread_offline() can be used to mark long periods for which the threads are not active. It provides the fastest read-side at the expense of more intrusiveness in the application code.

Usage of liburcu-mb

#include <urcu/urcu-mb.h>

Link with -lurcu-mb

This version of the urcu library uses memory barriers on the writer and reader sides. This results in faster grace-period detection, but results in slower reads.

Usage of liburcu-signal

#include <urcu/urcu-signal.h>

Link the application with -lurcu-signal

Version of the library that requires a signal, typically SIGUSR1. Can be overridden with -DSIGRCU by modifying Makefile.build.inc.

Usage of liburcu-bp

#include <urcu/urcu-bp.h>

Link with -lurcu-bp

The BP library flavor stands for "bulletproof". It is specifically designed to help tracing library to hook on applications without requiring to modify these applications.

urcu_bp_init(), and urcu_bp_unregister_thread() all become nops, whereas calling urcu_bp_register_thread() becomes optional. The state is dealt with by the library internally at the expense of read-side and write-side performance.
"""

> If the program links against libur...

Read more...

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Thank you for looking into this Matthew, Rafael and Dan! I think we have enough understanding now to give it a shot. I would feel at ease if, after accepting and building the package, we'd also once again sanity-check the rdeps if they still work as expected. Anyway, accepting!

Changed in liburcu (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Matthew, or anyone else affected,

Accepted liburcu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/liburcu/0.10.1-1ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Thanks for accepting Lukasz.

The customer installed liburcu 0.10.1-1ubuntu1 from -proposed on a bionic machine, and they had this to say:

Begins:

We were able to test our application with the proposed package you created. We were able to confirm better performance than we had with the current official package.
Here's the currently installed package on our device-under-test.

sudo apt-cache policy liburcu6 | grep Installed
Installed: 0.10.1-1ubuntu1

BTW, the application performance is still less than what we observe in 14.04 but for now, we will live with it.

Ends.

The customer found that the package in -proposed restores their application to an acceptable level of performance compared to using liburcu 0.10.1-1 in -updates. While they still report it being slower than what they experienced on trusty, performance is still good enough for them to migrate from trusty onto bionic.

I'm happy to mark this as verified.

For regression testing, I tested netsniff-ng 0.6.4-1, particularly the /usr/sbin/flowtop program, which links to liburcu. After manually downloading the geoip databases, the program functions as intended.

I made a small three node glusterfs cluster, with each node having 2 disks. One volume was created in replicating mode across the cluster, with a xfs formatted backing disk. glusterfs version is 3.13.2-1ubuntu1. I successfully managed to write 100 files to the glusterfs mountpoint, and it was successfully replicated across the cluster to the backing "bricks".

I used pull-lp-source to get the source for liburcu 0.10.1-1ubuntu1, and I ran a full
$ ./configure
$ make
$ make regtest

The output which you can find here:
https://paste.ubuntu.com/p/XFRkyHtkSj/

All regression tests pass, and benchmarks execute correctly.

Because of the above regression test set passing, and with manual testing of netsniff-ng and glusterfs, I am happy to say that liburcu and its rdeps still work as intended. Marking this as verified.

tags: added: verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package liburcu - 0.10.1-1ubuntu1

---------------
liburcu (0.10.1-1ubuntu1) bionic; urgency=medium

  * d/p/liburcu-use-membarrier-private-expedited.patch
  * d/p/liburcu-bp-use-membarrier-private-expedited.patch
    - Enable usage of MEMBARRIER_CMD_PRIVATE_EXPEDITED in the librcu and
      liburcu-bp flavours, for higher performance non-blocking syscalls.
      (LP: #1876230)

 -- Matthew Ruffell <email address hidden> Fri, 01 May 2020 16:11:48 +1200

Changed in liburcu (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for liburcu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.