qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

Bug #1805256 reported by dann frazier
32
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned
kunpeng920
Fix Released
Undecided
Unassigned
Ubuntu-18.04
Fix Released
Undecided
Unassigned
Ubuntu-19.10
Fix Released
Undecided
Unassigned
Ubuntu-20.04
Fix Released
Undecided
Unassigned
Upstream-kernel
Invalid
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Fix Released
Medium
Unassigned
Eoan
Fix Released
Medium
Unassigned
Focal
Fix Released
Medium
Unassigned

Bug Description

SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. Unfortunately the Bionic SRU did not work and we had to reverse the change. Since then we had another update and now I'm retrying the SRU.

After discussing with @paelzer (and @dannf as a reviewer) extensively, Christian and I agreed that we should scope this SRU as Aarch64 only AND I was much, much more conservative in question of what is being changed in the AIO qemu code.

New code has been tested against the initial Test Case and the new one, regressed for Bionic. More information (about tests and discussion) can be found in the MR at ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic-refix

BIONIC REGRESSION BUG:

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419

[Impact]

* QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely.

[Test Case]

INITIAL

* qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

Hangs indefinitely approximately 30% of the runs in Aarch64.

[Regression Potential]

* This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks.

* There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log.

* Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions.

* dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems.

BIONIC REGRESSED ISSUE

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419

[Other Info]

 * Original Description bellow:

Command:

qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

Hangs indefinitely approximately 30% of the runs.

----

Workaround:

qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

Run "qemu-img convert" with "a single coroutine" to avoid this issue.

----

(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...

(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()

(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start

----

(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2

[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]

[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]

<HUNG w/ 3 threads in the stack trace showed before>
"""

All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).

Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).

----

On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command:

qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2

Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images.

Once hung, attaching gdb gives the following backtrace:

(gdb) bt
#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760,
    timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
    __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
    timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1)
    at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975

Reproduced w/ latest QEMU git (@ 53744e0a182)

Related branches

Revision history for this message
John Snow (jnsnow) wrote :

Hi, can you do a `thread apply all bt` instead? If I were to bet, we're probably waiting for some slow call like lseek to return in another thread.

What filesystem/blockdevice is involved here?

Revision history for this message
dann frazier (dannf) wrote :

ext4 filesystem, SATA drive:

(gdb) thread apply all bt

Thread 3 (Thread 0xffff9bffc9a0 (LWP 9015)):
#0 0x0000ffffaaa462cc in __GI___sigtimedwait (set=<optimized out>,
    set@entry=0xaaaae725c070, info=info@entry=0xffff9bffbf18,
    timeout=0x3ff0000000000001, timeout@entry=0x0)
    at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1 0x0000ffffaab7dfac in __sigwait (set=set@entry=0xaaaae725c070,
    sig=sig@entry=0xffff9bffbff4) at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2 0x0000aaaad998a628 in sigwait_compat (opaque=0xaaaae725c070)
    at util/compatfd.c:36
#3 0x0000aaaad998bce0 in qemu_thread_start (args=<optimized out>)
    at util/qemu-thread-posix.c:498
#4 0x0000ffffaab73088 in start_thread (arg=0xffffc528531f)
    at pthread_create.c:463
#5 0x0000ffffaaae34ec in thread_start ()
    at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0xffffa0e779a0 (LWP 9014)):
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1 0x0000aaaad998c9e8 in qemu_futex_wait (val=<optimized out>, f=<optimized out>)
    at /home/ubuntu/qemu/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0xaaaad9a091c0 <rcu_call_ready_event>)
    at util/qemu-thread-posix.c:442
#3 0x0000aaaad99a6834 in call_rcu_thread (opaque=<optimized out>)
    at util/rcu.c:261
#4 0x0000aaaad998bce0 in qemu_thread_start (args=<optimized out>)
    at util/qemu-thread-posix.c:498
#5 0x0000ffffaab73088 in start_thread (arg=0xffffc528542f)
    at pthread_create.c:463
#6 0x0000ffffaaae34ec in thread_start ()
    at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xffffa0fa8010 (LWP 9013)):
#0 0x0000ffffaaada154 in __GI_ppoll (fds=0xaaaae7291dc0, nfds=187650771816320,
    timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc52852e0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000aaaad9987f00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>,
    __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
    timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3 0x0000aaaad9988f80 in os_host_main_loop_wait (timeout=-1)
    at util/main-loop.c:233
#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5 0x0000aaaad98b7a30 in convert_do_copy (s=0xffffc52854e8) at qemu-img.c:1980
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456
#7 0x0000aaaad98b033c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975

Alex Bennée (ajbennee)
tags: added: qemu-img
Revision history for this message
贞贵李 (lizhengui) wrote :

Hi, I also found a problem that qemu-img convert hands in ARM.

The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2 disk.raw ".

The bt is below:

Thread 2 (Thread 0x40000b776e50 (LWP 27215)):
#0 0x000040000a3f2994 in sigtimedwait () from /lib64/libc.so.6
#1 0x000040000a39c60c in sigwait () from /lib64/libpthread.so.0
#2 0x0000aaaaaae82610 in sigwait_compat (opaque=0xaaaac5163b00) at util/compatfd.c:37
#3 0x0000aaaaaae85038 in qemu_thread_start (args=args@entry=0xaaaac5163b90) at util/qemu_thread_posix.c:496
#4 0x000040000a3918bc in start_thread () from /lib64/libpthread.so.0
#5 0x000040000a492b2c in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x40000b573370 (LWP 27214)):
#0 0x000040000a489020 in ppoll () from /lib64/libc.so.6
#1 0x0000aaaaaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at qemu_timer.c:391
#3 0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=<optimized out>) at main_loop.c:272
#4 0x0000aaaaaadae190 in main_loop_wait (nonblocking=<optimized out>) at main_loop.c:534
#5 0x0000aaaaaad97be0 in convert_do_copy (s=0xffffdc32eb48) at qemu-img.c:1923
#6 0x0000aaaaaada2d70 in img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2414
#7 0x0000aaaaaad99ac4 in main (argc=7, argv=<optimized out>) at qemu-img.c:5305

Do you find the cause of the problem and fix it? Thanks for your reply!

Revision history for this message
贞贵李 (lizhengui) wrote :

sorry, I make a spelling mistake here("Hi, I also found a problem that qemu-img convert hands in ARM.").The right is "I also found a problem that qemu-img convert hangs in ARM".

dann frazier (dannf)
Changed in qemu:
status: New → Confirmed
Revision history for this message
dann frazier (dannf) wrote :

No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I found a report of similar symptoms:

  https://patchwork.kernel.org/patch/10047341/
  https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13

To be clear, ^ is already fixed upstream, so it is not the *same* issue - but perhaps related.

Revision history for this message
贞贵李 (lizhengui) wrote :

Do you have any good ideas about it? Maybe somewhere lack of memeory barriers that cause it?

Revision history for this message
贞贵李 (lizhengui) wrote :

frazier, Do you find the conditions that necessarily make this problem appear?

Revision history for this message
贞贵李 (lizhengui) wrote :

I can reproduce this problem with qemu.git/matser. It still exists in qemu.git/matser. I found that when an IO return in worker threads and want to call aio_notify to wake up main_loop, but it found that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling atomic_and(&ctx->notify_me, ~1) . So worker thread won't write enventfd to notify main_loop.If such a scene happens, the main_loop will hang:
    main loop worker thread1 worker thread2
-----------------------------------------------------------------------------------------------
    qemu_poll_ns aio_worker
                            qemu_bh_schedule(pool->completion_bh)
    glib_pollfds_poll
    g_main_context_check
    aio_ctx_check
    atomic_and(&ctx->notify_me, ~1) aio_worker
                                                      qemu_bh_schedule(pool->completion_bh)
    /* do something for event */
    qemu_poll_ns
    /* hangs !!!*/

As we known, ctx->notify_me will be visited by worker thread and main loop. I thank we should add a lock protection for ctx->notify_me to avoid this happend.what do you thank so?

dann frazier (dannf)
Changed in qemu (Ubuntu):
status: New → Confirmed
Changed in qemu (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
importance: Undecided → Medium
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Liz,

I'll try to reproduce this issue in a Cortex-A53 aarch64 real environment (w/ 24 HW threads) AND in a virtual environment w/ lots of vCPUs... but, if it's a barrier missing - or the lack of atomicity and/or ordering in a primitive - then, I'm afraid the context switch in between vCPUs might not be the same as in real CPUs (IPIs are sent and handled differently and host kernel delays IPI delivery because of its own callbacks, before scheduling, etc...) and I could need a qemu dump from your environment.

Would that be feasible ? Can you reproduce this nowadays ? This bug has aged a little, so I'm now sure!

Could you provide me the dump caused by latest package available for your Ubuntu version ? This way I have the debug symbols to work with.

Meanwhile, I'll be trying to reproduce on my side.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

OOhh nm on the virtual environment test, as I just remembered we don't have KVM on 2nd level for aarch64 yet (at least in ARMv8 implementing virt extension). I'll try to reproduce in the real env only.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Alright, I couldn't reproduce this yet, I'm running same test case in a 24 cores box and causing lots of context switches and CPU migrations in parallel (trying to exhaust the logic).

Will let this running for sometime to check.

Unfortunately this can be related QEMU AIO BH locking/primitives and cache coherency in the HW in question (which I got specs from: https://en.wikichip.org/wiki/hisilicon/kunpeng/hi1616):

l1$ size 8 MiB
l1d$ size 4 MiB
l1i$ size 4 MiB
l2$ size 32 MiB
l3$ size 64 MiB

like for example when having 2 threads in different NUMA domains, or some other situation.

I can't simulate the same since I have a SOC with:

Cortex-A53 MPCore 24cores,

L1 I/D=32KB/32KB
L2 =256KB
L3 =4MB

and I'm not even close to L1/L2/L3 cache numbers from D06 =o).

Just got a note that I'll be able to reproduce this in the real HW, will get back soon with real gdb debugging.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (4.5 KiB)

Alright, with a d06 aarch64 machine I was able to reproduce it after 8 attempts.I'll debug it today and provide feedback on my findings.

(gdb) bt full
#0 0x0000ffffb0b2181c in __GI_ppoll (fds=0xaaaace5ab770, nfds=4, timeout=<optimized out>, timeout@entry=0x0,
    sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
        _x3tmp = 0
        _x0tmp = 187650583213936
        _x0 = 187650583213936
        _x3 = 0
        _x4tmp = 8
        _x1tmp = 4
        _x1 = 4
        _x4 = 8
        _x2tmp = <optimized out>
        _x2 = 0
        _x8 = 73
        _sys_result = <optimized out>
        _sys_result = <optimized out>
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        tval = {tv_sec = 0, tv_nsec = 187650583137792}
#1 0x0000aaaacd2a773c in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
No locals.
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at ./util/qemu-timer.c:322
No locals.
#3 0x0000aaaacd2a8764 in os_host_main_loop_wait (timeout=-1) at ./util/main-loop.c:233
        context = 0xaaaace599d90
        ret = <optimized out>
        context = <optimized out>
        ret = <optimized out>
#4 main_loop_wait (nonblocking=<optimized out>) at ./util/main-loop.c:497
        ret = <optimized out>
        timeout = 4294967295
        timeout_ns = <optimized out>
#5 0x0000aaaacd1df454 in convert_do_copy (s=0xfffff9b2b1d8) at ./qemu-img.c:1981
        ret = <optimized out>
        i = <optimized out>
        n = <optimized out>
        sector_num = <optimized out>
        ret = <optimized out>
        i = <optimized out>
        n = <optimized out>
        sector_num = <optimized out>
#6 img_convert (argc=<optimized out>, argv=<optimized out>) at ./qemu-img.c:2457
        c = <optimized out>
        bs_i = <optimized out>
        flags = 16898
        src_flags = 0
        fmt = 0xfffff9b2bad1 "qcow2"
        out_fmt = <optimized out>
        cache = 0xaaaacd2cb1c8 "unsafe"
        src_cache = 0xaaaacd2ca9c0 "writeback"
        out_baseimg = <optimized out>
        out_filename = <optimized out>
        out_baseimg_param = <optimized out>
        snapshot_name = 0x0
        drv = <optimized out>
        proto_drv = <optimized out>
        bdi = {cluster_size = 65536, vm_state_offset = 32212254720, is_dirty = false, unallocated_blocks_are_zero = true,
          needs_compressed_writes = false}
        out_bs = <optimized out>
        opts = 0xaaaace5ab390
        sn_opts = 0x0
        create_opts = 0xaaaace5ab0c0
        open_opts = <optimized out>
        options = 0x0
        local_err = 0x0
        writethrough = false
        src_writethrough = false
        quiet = <optimized out>
        image_opts = false
        skip_create = false
        progress = <optimized out>
        tgt_image_opts = false
        ret = <optimized out>
        force_share = false
        explict_min_sparse = false
        s = {src = 0xaaaace577240, src_sectors = 0xaaaace577300, src_num = 1, total_sectors = 62914560,allocated_sectors = 9572096, allocated_done = 6541440, sector_num = 8863744, wr_offs = 885...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (10.7 KiB)

Alright,

I'm still investigating this but wanted to share some findings... I haven't got a kernel dump yet after the task is frozen, I have analyzed only the userland part of it (although I have checked if code was running inside kernel with perf cycles:u/cycles:k at some point).

The big picture is this: Whenever qemu-img hangs, we have 3 hung tasks basically with these stacks:

----

TRHREAD #1
__GI_ppoll (../sysdeps/unix/sysv/linux/ppoll.c:39)
ppoll (/usr/include/aarch64-linux-gnu/bits/poll2.h:77)
qemu_poll_ns (./util/qemu-timer.c:322)
os_host_main_loop_wait (./util/main-loop.c:233)
main_loop_wait (./util/main-loop.c:497)
convert_do_copy (./qemu-img.c:1981)
img_convert (./qemu-img.c:2457)
main (./qemu-img.c:4976)

got stack traces:

./33293/stack ./33293/stack
[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218
[<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sys_poll+0x508/0x5c0
[<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_ppoll+0xc0/0x118
[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168
[<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78
                                       [<0>] el0_svc+0x8/0xc

root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33293 -- sleep 10
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.871 MB perf.data (48730 samples) ]

root@d06-1:~$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ........ .................. ......................
#
    37.82% qemu-img libc-2.29.so [.] 0x00000000000df710
    21.81% qemu-img [unknown] [k] 0xffff000010099504
    14.23% qemu-img [unknown] [k] 0xffff000010085dc0
     9.13% qemu-img [unknown] [k] 0xffff00001008fff8
     6.47% qemu-img libc-2.29.so [.] 0x00000000000df708
     5.69% qemu-img qemu-img [.] qemu_event_reset
     2.57% qemu-img libc-2.29.so [.] 0x00000000000df678
     0.63% qemu-img libc-2.29.so [.] 0x00000000000df700
     0.49% qemu-img libc-2.29.so [.] __sigtimedwait
     0.42% qemu-img libpthread-2.29.so [.] __libc_sigwait

----

TRHREAD #3
__GI___sigtimedwait (../sysdeps/unix/sysv/linux/sigtimedwait.c:29)
__sigwait (linux/sigwait.c:28)
qemu_thread_start (./util/qemu-thread-posix.c:498)
start_thread (pthread_create.c:486)
thread_start (linux/aarch64/clone.S:78)

./33303/stack ./33303/stack
[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218
[<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sigtimedwait.isra.9+0x194/0x288
[<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_rt_sigtimedwait+0xac/0x110
[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168
[<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78
                                       [<0>] el0_svc+0x8/0xc

root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33303 -- sleep 10
[ perf record: Woken up 6 times to write data ]
[ perf record:...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (8.9 KiB)

Alright, here is what is happening:

Whenever program is stuck, thread #2 backtrace is this:

(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1 0x0000aaaaaabd41b0 in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at ./util/qemu-thread-posix.c:438
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:442
#3 0x0000aaaaaabee03c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261
#4 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498
#5 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486
#6 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Meaning that code is waiting for a futex inside kernel.

(gdb) print rcu_call_ready_event
$4 = {value = 4294967295, initialized = true}

The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet.

rcu_call_ready_event->value is only touched by:

qemu_event_init() -> bool init ? EV_SET : EV_FREE
qemu_event_reset() -> atomic_or(&ev->value, EV_FREE)
qemu_event_set() -> atomic_xchg(&ev->value, EV_SET)
qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)'

And there should be no 0x7fff value for "ev->value".

qemu_event_init() is the one initializing the global:

    static QemuEvent rcu_call_ready_event;

and it is called by "rcu_init_complete()" which is called by "rcu_init()":

    static void __attribute__((__constructor__)) rcu_init(void)

a constructor function.

So, "fixing" this issue by:

    (gdb) print rcu_call_ready_event
    $8 = {value = 4294967295, initialized = true}

    (gdb) watch rcu_call_ready_event
    Hardware watchpoint 1: rcu_call_ready_event

    (gdb) set rcu_call_ready_event.initialized = 1

    (gdb) set rcu_call_ready_event.value = 0

and note that I added a watchpoint to rcu_call_ready_event global:

<HANG>

Thread 1 "qemu-img" received signal SIGINT, Interrupt.
(gdb) thread 2
[Switching to thread 2 (Thread 0xffffbec61d90 (LWP 33625))]

(gdb) bt
#0 0x0000aaaaaabd4110 in qemu_event_reset (ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>)
#1 0x0000aaaaaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255
#2 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498
#3 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486
#4 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) print rcu_call_ready_event
$9 = {value = 0, initialized = true}

You can see I advanced in the qemu_event_{reset,set,wait} logic.

    (gdb) disassemble /m 0x0000aaaaaabd4110
    Dump of assembler code for function qemu_event_reset:
    408 in ./util/qemu-thread-posix.c

    409 in ./util/qemu-thread-posix.c

    410 in ./util/qemu-thread-posix.c
    411 in ./util/qemu-thread-posix.c
       0x0000aaaaaabd40f0 <+0>: ldrb w1, [x0, #4]
       0x0000aaaaaabd40f4 <+4>: cbz w1, 0xaaaaaabd411c <qemu_event_reset+44>
       0x0000aaaaaabd411c <+44>: stp x29, x30, [sp, #-16]!
       0x0000aaaaaabd...

Read more...

summary: - qemu-img hangs on high core count ARM system
+ qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting
+ images
Changed in qemu:
status: Confirmed → In Progress
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

QEMU BUG: #1

Alright, one of the issues is (according to comment #14):

"""
Meaning that code is waiting for a futex inside kernel.

(gdb) print rcu_call_ready_event
$4 = {value = 4294967295, initialized = true}

The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet.

rcu_call_ready_event->value is only touched by:

qemu_event_init() -> bool init ? EV_SET : EV_FREE
qemu_event_reset() -> atomic_or(&ev->value, EV_FREE)
qemu_event_set() -> atomic_xchg(&ev->value, EV_SET)
qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)'
"""

Now I know why rcu_call_ready_event->value is set to INT_MAX. That is because in the following declaration:

struct QemuEvent {
#ifndef __linux__
    pthread_mutex_t lock;
    pthread_cond_t cond;
#endif
    unsigned value;
    bool initialized;
};

#define EV_SET 0
#define EV_FREE 1
#define EV_BUSY -1

"value" is declared as unsigned, but EV_BUSY sets it to -1, and, according to the Two's Complement Operation (https://en.wikipedia.org/wiki/Two%27s_complement), it will be INT_MAX (4294967295).

So this is the "first bug" found AND it is definitely funny that this hasn't been seen in other architectures at all... I can reproduce it at will.

With that said, it seems that there is still another issue causing (less frequently):

(gdb) thread 2
[Switching to thread 2 (Thread 0xffffbec5ad90 (LWP 17459))]
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/aarch64/syscall.S: No such file or directory.
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1 0x0000aaaaaabd41cc in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at ./util/qemu-thread-posix.c:438
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:442
#3 0x0000aaaaaabed05c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261
#4 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498
#5 0x0000ffffbf25c880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486
#6 0x0000ffffbf1b6b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 to be stuck at "futex()" kernel syscall (like the FUTEX_WAKE never happened and/or wasn't atomic for this arch/binary). Need to investigate this also.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote : qemu_futex_wait() lockups in ARM64: 2 possible issues

Paolo,

While debugging hungs in ARM64 while doing a simple:

qemu-img convert -f qcow2 -O qcow2 file.qcow2 output.qcow2

I might have found 2 issues which I'd like you to review, if possible.

ISSUE #1
========

I've caught the following stack trace after an HUNG in qemu-img convert:

(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()

(gdb) print rcu_call_ready_event
$4 = {value = 4294967295, initialized = true}

value INT_MAX (4294967295) seems WRONG for qemu_futex_wait():

- EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *,
unsigned), is a two's complement, making argument into a INT_MAX when
that's not what is expected (unless I missed something).

*** If that is the case, unsure if you, Paolo, prefer declaring
*(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay
here ***

BUG: description:
https://bugs.launchpad.net/qemu/+bug/1805256/comments/15

========
ISSUE #2
========

I found this when debugging lockups while in futex() in a specific ARM64
server - https://bugs.launchpad.net/qemu/+bug/1805256 - which I'm still
investigating.

After fixing the issue above, I'm still getting stuck into:

qemu_event_wait() -> qemu_futex_wait()

***
As if qemu_event_set() has ran before qemu_futex_wait() ever started running
***

The Other threads are waiting for poll() on a PIPE coming from this
stuck thread (thread #1), and in sigwait():

(gdb) thread 1
...
(gdb) bt
#0 0x0000ffffbf1ad81c in __GI_ppoll
#1 0x0000aaaaaabcf73c in ppoll
#2 qemu_poll_ns
#3 0x0000aaaaaabd0764 in os_host_main_loop_wait
#4 main_loop_wait
...

(gdb) thread 2
...
(gdb) bt
#0 syscall ()
#1 0x0000aaaaaabd41cc in qemu_futex_wait
#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
#3 0x0000aaaaaabed05c in call_rcu_thread
#4 0x0000aaaaaabd34c8 in qemu_thread_start
#5 0x0000ffffbf25c880 in start_thread
#6 0x0000ffffbf1b6b9c in thread_start ()

(gdb) thread 3
...
(gdb) bt
#0 0x0000ffffbf11aa20 in __GI___sigtimedwait
#1 0x0000ffffbf2671b4 in __sigwait
#2 0x0000aaaaaabd1ddc in sigwait_compat
#3 0x0000aaaaaabd34c8 in qemu_thread_start
#4 0x0000ffffbf25c880 in start_thread
#5 0x0000ffffbf1b6b9c in thread_start

QUESTION:

- Should qemu_event_set() check return code from
qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY
waiter was ever woken up ? Maybe even loop until at least 1 is awaken ?

Tks in advance,

Rafael D. Tinoco

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

In comment #14, please disregard the second half of the issue, related to:

       0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24>
       0x0000aaaaaabd4104 <+20>: ret
       0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0]
       0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1
    => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0]
       0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108

Duh! This is just a regular load/xor/store logic for atomic_or() inside qemu_event_reset().

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Quick update...

> value INT_MAX (4294967295) seems WRONG for qemu_futex_wait():
>
> - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *,
> unsigned), is a two's complement, making argument into a INT_MAX when
> that's not what is expected (unless I missed something).
>
> *** If that is the case, unsure if you, Paolo, prefer declaring
> *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay
> here ***
>
> BUG: description:
> https://bugs.launchpad.net/qemu/+bug/1805256/comments/15

I realized this might be intentional, but, still, I tried:

    https://pastebin.ubuntu.com/p/6rkkY6fJdm/

looking for anything that could have misbehaved in arm64 (specially
concerned on casting and type conversions between the functions).

> QUESTION:
>
> - Should qemu_event_set() check return code from
> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY
> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ?

And I also tried:

- qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
+ while(qemu_futex(pval, FUTEX_WAKE, val, NULL, NULL, 0) == 0)
+ continue;

and it made little difference (took way more time for me to reproduce
the issue though):

"""
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2

[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]

[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]

<HUNG w/ 3 threads in the stack trace showed before>
"""

All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).

Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).

I'll try to check why those tasks exited.

Any thoughts ?

Tks

description: updated
description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (5.3 KiB)

> Zhengui's theory that notify_me doesn't work properly on ARM is more
> promising, but he couldn't provide a clear explanation of why he thought
> notify_me is involved. In particular, I would have expected notify_me to
> be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example:
>
>
> glib_pollfds_fill
> g_main_context_prepare
> aio_ctx_prepare
> atomic_or(&ctx->notify_me, 1)
> qemu_poll_ns
> glib_pollfds_poll
> g_main_context_check
> aio_ctx_check
> atomic_and(&ctx->notify_me, ~1)
> g_main_context_dispatch
> aio_ctx_dispatch
> /* do something for event */
> qemu_poll_ns
>

Paolo,

I tried confining execution in a single NUMA domain (cpu & mem) and
still faced the issue, then, I added a mutex "ctx->notify_me_lcktest"
into context to protect "ctx->notify_me", like showed bellow, and it
seems to have either fixed or mitigated it.

I was able to cause the hung once every 3 or 4 runs. I have already ran
qemu-img convert more than 30 times now and couldn't reproduce it again.

Next step is to play with the barriers and check why existing ones
aren't enough for ordering access to ctx->notify_me ... or should I
try/do something else in your opinion ?

This arch/machine (Huawei D06):

$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 4
Vendor ID: 0x48
Model: 0
Stepping: 0x0
CPU max MHz: 2000.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 32768K
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
cpuid asimdrdm dcpop

----

diff --git a/include/block/aio.h b/include/block/aio.h
index 0ca25dfec6..0724086d91 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -84,6 +84,7 @@ struct AioContext {
      * dispatch phase, hence a simple counter is enough for them.
      */
     uint32_t notify_me;
+ QemuMutex notify_me_lcktest;

     /* A lock to protect between QEMUBH and AioHandler adders and deleter,
      * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 51c41ed3c9..031d6e2997 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx,
int64_t max_ns, int64_t *timeout)
     bool progress;
     int64_t start_time, elapsed_time;

+ qemu_mutex_lock(&ctx->notify_me_lcktest);
     assert(ctx->notify_me);
+ qemu_mutex_unlock(&ctx->notify_me_lcktest);
     assert(qemu_lockcnt_count(&ctx->list_lock) > 0);

     trace_run_poll_handlers_begin(ctx, max_ns, *timeout);
@@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
      * so disable the optimization now.
      */
     if (blocking) {
+ qemu_mutex_lock(&ctx->notify_me_lcktest);
       ...

Read more...

Revision history for this message
dann frazier (dannf) wrote : Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Download full text (6.3 KiB)

On Wed, Sep 11, 2019 at 04:09:25PM -0300, Rafael David Tinoco wrote:
> > Zhengui's theory that notify_me doesn't work properly on ARM is more
> > promising, but he couldn't provide a clear explanation of why he thought
> > notify_me is involved. In particular, I would have expected notify_me to
> > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example:
> >
> >
> > glib_pollfds_fill
> > g_main_context_prepare
> > aio_ctx_prepare
> > atomic_or(&ctx->notify_me, 1)
> > qemu_poll_ns
> > glib_pollfds_poll
> > g_main_context_check
> > aio_ctx_check
> > atomic_and(&ctx->notify_me, ~1)
> > g_main_context_dispatch
> > aio_ctx_dispatch
> > /* do something for event */
> > qemu_poll_ns
> >
>
> Paolo,
>
> I tried confining execution in a single NUMA domain (cpu & mem) and
> still faced the issue, then, I added a mutex "ctx->notify_me_lcktest"
> into context to protect "ctx->notify_me", like showed bellow, and it
> seems to have either fixed or mitigated it.
>
> I was able to cause the hung once every 3 or 4 runs. I have already ran
> qemu-img convert more than 30 times now and couldn't reproduce it again.
>
> Next step is to play with the barriers and check why existing ones
> aren't enough for ordering access to ctx->notify_me ... or should I
> try/do something else in your opinion ?
>
> This arch/machine (Huawei D06):
>
> $ lscpu
> Architecture: aarch64
> Byte Order: Little Endian
> CPU(s): 96
> On-line CPU(s) list: 0-95
> Thread(s) per core: 1
> Core(s) per socket: 48
> Socket(s): 2
> NUMA node(s): 4
> Vendor ID: 0x48
> Model: 0
> Stepping: 0x0
> CPU max MHz: 2000.0000
> CPU min MHz: 200.0000
> BogoMIPS: 200.00
> L1d cache: 64K
> L1i cache: 64K
> L2 cache: 512K
> L3 cache: 32768K
> NUMA node0 CPU(s): 0-23
> NUMA node1 CPU(s): 24-47
> NUMA node2 CPU(s): 48-71
> NUMA node3 CPU(s): 72-95
> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
> cpuid asimdrdm dcpop

Note that I'm also seeing this on a ThunderX2 (same calltrace):

$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-223
Thread(s) per core: 4
Core(s) per socket: 28
Socket(s): 2
NUMA node(s): 2
Vendor ID: Cavium
Model: 1
Model name: ThunderX2 99xx
Stepping: 0x1
BogoMIPS: 400.00
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 32768K
NUMA node0 CPU(s): 0-111
NUMA node1 CPU(s): 112-223
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm

  -dann

> ----
>
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 0ca25dfec6..0724086d91 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -84,6 +84,7 @@ struct AioContext {
> * dispatch phase, hence a simple counter is enough for them.
> */
> uint32_t notify_me;
> + QemuMutex notif...

Read more...

Revision history for this message
Jan Glauber (jan-glauber-i) wrote :

I've looked into this on ThunderX2. The arm64 code generated for the
atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
memory barriers. It is just plain ldaxr/stlxr.

From my understanding this is not sufficient for SMP sync.

If I read this comment correct:

    void aio_notify(AioContext *ctx)
    {
        /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
         * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
         */
        smp_mb();
        if (ctx->notify_me) {

it points out that the smp_mb() should be paired. But as
I said the used atomics don't generate any barriers at all.

I've tried to verify me theory with this patch and didn't run into the
issue for ~500 iterations (usually I would trigger the issue ~20 iterations).

--Jan

diff --git a/util/aio-posix.c b/util/aio-posix.c
index d8f0cb4af8dd..d07dcd4e9993 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
      */
     if (blocking) {
         atomic_add(&ctx->notify_me, 2);
+ smp_mb();
     }

     qemu_lockcnt_inc(&ctx->list_lock);
@@ -632,6 +633,7 @@ bool aio_poll(AioContext *ctx, bool blocking)

     if (blocking) {
         atomic_sub(&ctx->notify_me, 2);
+ smp_mb();
     }

     /* Adjust polling time */
diff --git a/util/async.c b/util/async.c
index 4dd9d95a9e73..92ac209c4615 100644
--- a/util/async.c
+++ b/util/async.c
@@ -222,6 +222,7 @@ aio_ctx_prepare(GSource *source, gint *timeout)
     AioContext *ctx = (AioContext *) source;

     atomic_or(&ctx->notify_me, 1);
+ smp_mb();

     /* We assume there is no timeout already supplied */
     *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
@@ -240,6 +241,7 @@ aio_ctx_check(GSource *source)
     QEMUBH *bh;

     atomic_and(&ctx->notify_me, ~1);
+ smp_mb();
     aio_notify_accept(ctx);

     for (bh = ctx->first_bh; bh; bh = bh->next) {

Revision history for this message
Jan Glauber (jan-glauber-i) wrote :

Debug files for aio-posix generated on 18.04 on ThunderX2.

Compiler:
gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)

Distro:
Ubuntu 18.04.3 LTS

Revision history for this message
Jan Glauber (jan-glauber-i) wrote :

On Wed, Oct 02, 2019 at 11:45:19AM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 iterations).
>
> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub
> are defined as sequentially consistent:
>
> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))

Compiler bug sounds kind of unlikely...

> What compiler are you using and what distro? Can you compile util/aio-posix.c
> with "-fdump-rtl-all -fdump-tree-all", zip the boatload of debugging files and
> send them my way?

This is on Ubuntu 18.04.3,
gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)

I've uploaded the debug files to:
https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5293619/+files/aio-posix.tar.xz

Thanks,
Jan

> Thanks,
>
> Paolo

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (5.2 KiB)

Documenting this here as bug# was dropped from the mail thread:

On 02/10/19 13:05, Jan Glauber wrote:
> The arm64 code generated for the
> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> memory barriers. It is just plain ldaxr/stlxr.
>
> From my understanding this is not sufficient for SMP sync.
>
>>> If I read this comment correct:
>>>
>>> void aio_notify(AioContext *ctx)
>>> {
>>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
>>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>> */
>>> smp_mb();
>>> if (ctx->notify_me) {
>>>
>>> it points out that the smp_mb() should be paired. But as
>>> I said the used atomics don't generate any barriers at all.
>>
>> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub
>> are defined as sequentially consistent:
>>
>> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
>> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
>
> Compiler bug sounds kind of unlikely...
Indeed the assembly produced by the compiler matches for example the
mappings at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. A
small testcase is as follows:

  int ctx_notify_me;
  int bh_scheduled;

  int x()
  {
      int one = 1;
      int ret;
      __atomic_store(&bh_scheduled, &one, __ATOMIC_RELEASE); // x1
      __atomic_thread_fence(__ATOMIC_SEQ_CST); // x2
      __atomic_load(&ctx_notify_me, &ret, __ATOMIC_RELAXED); // x3
      return ret;
  }

  int y()
  {
      int ret;
      __atomic_fetch_add(&ctx_notify_me, 2, __ATOMIC_SEQ_CST); // y1
      __atomic_load(&bh_scheduled, &ret, __ATOMIC_RELAXED); // y2
      return ret;
  }

Here y (which is aio_poll) wants to order the write to ctx->notify_me
before reads of bh->scheduled. However, the processor can speculate the
load of bh->scheduled between the load-acquire and store-release of
ctx->notify_me. So you can have something like:

 thread 0 (y) thread 1 (x)
 ----------------------------------- -----------------------------
 y1: load-acq ctx->notify_me
 y2: load-rlx bh->scheduled
                                       x1: store-rel bh->scheduled <-- 1
                                       x2: memory barrier
                                       x3: load-rlx ctx->notify_me
 y1: store-rel ctx->notify_me <-- 2

Being very puzzled, I tried to put this into cppmem:

  int main() {
    atomic_int ctx_notify_me = 0;
    atomic_int bh_scheduled = 0;
    {{{ {
          bh_scheduled.store(1, mo_release);
          atomic_thread_fence(mo_seq_cst);
          // must be zero since the bug report shows no notification
          ctx_notify_me.load(mo_relaxed).readsvalue(0);
        }
    ||| {
          ctx_notify_me.store(2, mo_seq_cst);
          r2=bh_scheduled.load(mo_relaxed);
        }
    }}};
    return 0;
  }

and much to my surprise, the tool said r2 *can* be 0. Same if I put a
CAS like

        cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2,
                            mo_seq_cst, mo_seq_cst);

which resembles the code in ...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (6.3 KiB)

On Wed, 2019-10-02 at 15:20 +0200, Paolo Bonzini wrote:
> On 02/10/19 13:05, Jan Glauber wrote:
>> The arm64 code generated for the
>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
>> memory barriers. It is just plain ldaxr/stlxr.
>>
>> From my understanding this is not sufficient for SMP sync.
>>
>>>> If I read this comment correct:
>>>>
>>>> void aio_notify(AioContext *ctx)
>>>> {
>>>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
>>>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>>> */
>>>> smp_mb();
>>>> if (ctx->notify_me) {
>>>>
>>>> it points out that the smp_mb() should be paired. But as
>>>> I said the used atomics don't generate any barriers at all.
>>>
>>> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub
>>> are defined as sequentially consistent:
>>>
>>> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
>>> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
>>
>> Compiler bug sounds kind of unlikely...
>
> Indeed the assembly produced by the compiler matches for example the
> mappings at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. A
> small testcase is as follows:
>
> int ctx_notify_me;
> int bh_scheduled;
>
> int x()
> {
> int one = 1;
> int ret;
> __atomic_store(&bh_scheduled, &one, __ATOMIC_RELEASE); // x1
> __atomic_thread_fence(__ATOMIC_SEQ_CST); // x2
> __atomic_load(&ctx_notify_me, &ret, __ATOMIC_RELAXED); // x3
> return ret;
> }
>
> int y()
> {
> int ret;
> __atomic_fetch_add(&ctx_notify_me, 2, __ATOMIC_SEQ_CST); // y1
> __atomic_load(&bh_scheduled, &ret, __ATOMIC_RELAXED); // y2
> return ret;
> }
>
> Here y (which is aio_poll) wants to order the write to ctx->notify_me
> before reads of bh->scheduled. However, the processor can speculate the
> load of bh->scheduled between the load-acquire and store-release of
> ctx->notify_me. So you can have something like:
>
> thread 0 (y) thread 1 (x)
> ----------------------------------- -----------------------------
> y1: load-acq ctx->notify_me
> y2: load-rlx bh->scheduled
> x1: store-rel bh->scheduled <-- 1
> x2: memory barrier
> x3: load-rlx ctx->notify_me
> y1: store-rel ctx->notify_me <-- 2
>
> Being very puzzled, I tried to put this into cppmem:
>
> int main() {
> atomic_int ctx_notify_me = 0;
> atomic_int bh_scheduled = 0;
> {{{ {
> bh_scheduled.store(1, mo_release);
> atomic_thread_fence(mo_seq_cst);
> // must be zero since the bug report shows no notification
> ctx_notify_me.load(mo_relaxed).readsvalue(0);
> }
> ||| {
> ctx_notify_me.store(2, mo_seq_cst);
> r2=bh_scheduled.load(mo_relaxed);
> }
> }}};
> return 0;
> }
>
> and much to my surprise, the tool said r2 *can* be 0. Same if I put a
> CAS like
>
> ...

Read more...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

On 02/10/19 16:58, Torvald Riegel wrote:
> This example looks like Dekker synchronization (if I get the intent right).

It is the same pattern. However, one of the two synchronized variables
is a counter rather than just a flag.

> Two possible implementations of this are either (1) with all memory
> accesses having seq-cst MO, or (2) with relaxed-MO accesses and seq-cst
> fences on between the store and load on both ends. It's possible to mix
> both, but that get's trickier I think. I'd prefer the one with just
> fences, just because it's easiest, conceptually.

Got it.

I'd also prefer the one with just fences, because we only really control
one side of the synchronization primitive (ctx_notify_me in my litmus
test) and I don't like the idea of forcing seq-cst MO on the other side
(bh_scheduled). The performance issue that I mentioned is that x86
doesn't have relaxed fetch and add, so you'd have a redundant fence like
this:

 lock xaddl $2, mem1
 mfence
 ...
 movl mem1, %r8

(Gory QEMU details however allow us to use relaxed load and store here,
because there's only one writer).

> It works if you use (1) or (2) consistently. cppmem and the Batty et al.
> tech report should give you the gory details.
>
>> 1) understand why ATOMIC_SEQ_CST is not enough in this case. QEMU code
>> seems to be making the same assumptions as Linux about the memory model,
>> and this is wrong because QEMU uses C11 atomics if available.
>> Fortunately, this kind of synchronization in QEMU is relatively rare and
>> only this particular bit seems affected. If there is a fix which stays
>> within the C11 memory model, and does not pessimize code on x86, we can
>> use it[1] and document the pitfall.
>
> Using the fences between the store/load pairs in Dekker-like
> synchronization should do that, right? It's also relatively easy to deal
> with.
>
>> 2) if there's no way to fix the bug, qemu/atomic.h needs to switch to
>> __sync_fetch_and_add and friends. And again, in this case the
>> difference between the C11 and Linux/QEMU memory models must be documented.
>
> I surely not aware of all the constraints here, but I'd be surprised if the
> C11 memory model isn't good enough for portable synchronization code (with
> the exception of the consume MO minefield, perhaps).

This helps a lot already; I'll work on a documentation and code patch.
Thanks very much.

Paolo

>> int main() {
>> atomic_int ctx_notify_me = 0;
>> atomic_int bh_scheduled = 0;
>> {{{ {
>> bh_scheduled.store(1, mo_release);
>> atomic_thread_fence(mo_seq_cst);
>> // must be zero since the bug report shows no notification
>> ctx_notify_me.load(mo_relaxed).readsvalue(0);
>> }
>> ||| {
>> ctx_notify_me.store(2, mo_seq_cst);
>> r2=bh_scheduled.load(mo_relaxed);
>> }
>> }}};
>> return 0;
>> }

Changed in qemu (Ubuntu Disco):
importance: Undecided → Medium
Changed in qemu (Ubuntu Bionic):
importance: Undecided → Medium
Revision history for this message
Jan Glauber (jan-glauber-i) wrote :

On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> >
> > From my understanding this is not sufficient for SMP sync.
> >
> > If I read this comment correct:
> >
> > void aio_notify(AioContext *ctx)
> > {
> > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> > */
> > smp_mb();
> > if (ctx->notify_me) {
> >
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
>
> Based on the rest of the thread, this patch should also fix the bug:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
> aio_notify_accept(ctx);
>
> for (bh = ctx->first_bh; bh; bh = bh->next) {
> - if (bh->scheduled) {
> + if (atomic_mb_read(&bh->scheduled)) {
> return true;
> }
> }
>
>
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>
> void aio_notify(AioContext *ctx)
> {
> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
> + * atomic_add in aio_poll.
> */
> - smp_mb();
> - if (ctx->notify_me) {
> + if (atomic_mb_read(&ctx->notify_me)) {
> event_notifier_set(&ctx->notifier);
> atomic_mb_set(&ctx->notified, true);
> }
>
>
> Would you be able to test these (one by one possibly)?

Sure.

> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 iterations).
>
> Sorry for asking the obvious---500 iterations of what?

The testcase mentioned in the Canonical issue:
https://bugs.launchpad.net/qemu/+bug/1805256

It's a simple image convert:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

Usually it got stuck after 3-20 iterations.

--Jan

Revision history for this message
dann frazier (dannf) wrote :
Download full text (5.1 KiB)

On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> >
> > From my understanding this is not sufficient for SMP sync.
> >
> > If I read this comment correct:
> >
> > void aio_notify(AioContext *ctx)
> > {
> > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> > */
> > smp_mb();
> > if (ctx->notify_me) {
> >
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
>
> Based on the rest of the thread, this patch should also fix the bug:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
> aio_notify_accept(ctx);
>
> for (bh = ctx->first_bh; bh; bh = bh->next) {
> - if (bh->scheduled) {
> + if (atomic_mb_read(&bh->scheduled)) {
> return true;
> }
> }
>
>
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
>
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>
> void aio_notify(AioContext *ctx)
> {
> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
> + * atomic_add in aio_poll.
> */
> - smp_mb();
> - if (ctx->notify_me) {
> + if (atomic_mb_read(&ctx->notify_me)) {
> event_notifier_set(&ctx->notifier);
> atomic_mb_set(&ctx->notified, true);
> }
>
>
> Would you be able to test these (one by one possibly)?

Paolo,
  I tried them both separately and together on a Hi1620 system, each
time it hung in the first iteration. Here's a backtrace of a run with
both patches applied:

(gdb) thread apply all bt

Thread 3 (Thread 0xffff8154b820 (LWP 63900)):
#0 0x0000ffff8b9402cc in __GI___sigtimedwait (set=<optimized out>, set@entry=0xaaaaf1e08070,
    info=info@entry=0xffff8154ad98, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1 0x0000ffff8ba77fac in __sigwait (set=set@entry=0xaaaaf1e08070, sig=sig@entry=0xffff8154ae74)
    at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2 0x0000aaaab7dc1610 in sigwait_compat (opaque=0xaaaaf1e08070) at util/compatfd.c:35
#3 0x0000aaaab7dc3e80 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
#4 0x0000ffff8ba6d088 in start_thread (arg=0xffffceefbf4f) at pthread_create.c:463
#5 0x0000ffff8b9dd4ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0x...

Read more...

Revision history for this message
Jan Glauber (jan-glauber-i) wrote :
Download full text (4.0 KiB)

On Mon, Oct 07, 2019 at 04:58:30PM +0200, Paolo Bonzini wrote:
> On 07/10/19 16:44, dann frazier wrote:
> > On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> >> On 02/10/19 11:23, Jan Glauber wrote:
> >>> I've looked into this on ThunderX2. The arm64 code generated for the
> >>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> >>> memory barriers. It is just plain ldaxr/stlxr.
> >>>
> >>> From my understanding this is not sufficient for SMP sync.
> >>>
> >>> If I read this comment correct:
> >>>
> >>> void aio_notify(AioContext *ctx)
> >>> {
> >>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> >>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >>> */
> >>> smp_mb();
> >>> if (ctx->notify_me) {
> >>>
> >>> it points out that the smp_mb() should be paired. But as
> >>> I said the used atomics don't generate any barriers at all.
> >>
> >> Based on the rest of the thread, this patch should also fix the bug:
> >>
> >> diff --git a/util/async.c b/util/async.c
> >> index 47dcbfa..721ea53 100644
> >> --- a/util/async.c
> >> +++ b/util/async.c
> >> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
> >> aio_notify_accept(ctx);
> >>
> >> for (bh = ctx->first_bh; bh; bh = bh->next) {
> >> - if (bh->scheduled) {
> >> + if (atomic_mb_read(&bh->scheduled)) {
> >> return true;
> >> }
> >> }
> >>
> >>
> >> And also the memory barrier in aio_notify can actually be replaced
> >> with a SEQ_CST load:
> >>
> >> diff --git a/util/async.c b/util/async.c
> >> index 47dcbfa..721ea53 100644
> >> --- a/util/async.c
> >> +++ b/util/async.c
> >> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
> >>
> >> void aio_notify(AioContext *ctx)
> >> {
> >> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
> >> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> >> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
> >> + * atomic_add in aio_poll.
> >> */
> >> - smp_mb();
> >> - if (ctx->notify_me) {
> >> + if (atomic_mb_read(&ctx->notify_me)) {
> >> event_notifier_set(&ctx->notifier);
> >> atomic_mb_set(&ctx->notified, true);
> >> }
> >>
> >>
> >> Would you be able to test these (one by one possibly)?
> >
> > Paolo,
> > I tried them both separately and together on a Hi1620 system, each
> > time it hung in the first iteration. Here's a backtrace of a run with
> > both patches applied:
>
> Ok, not a great start... I'll find myself an aarch64 machine and look
> at it more closely. I'd like the patch to be something we can
> understand and document, since this is probably the second most-used
> memory barrier idiom in QEMU.
>
> Paolo

I'm still not sure what the actual issue is here, but could it be some bad
interaction between the notify_me and the list_lock? The are both 4 byte
and side-by-side:

address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4
address list_lock: 0xaaaadb528aa4 size...

Read more...

Revision history for this message
Jan Glauber (jan-glauber-i) wrote :

On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> On 09/10/19 10:02, Jan Glauber wrote:

> > I'm still not sure what the actual issue is here, but could it be some bad
> > interaction between the notify_me and the list_lock? The are both 4 byte
> > and side-by-side:
> >
> > address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4
> > address list_lock: 0xaaaadb528aa4 sizeof list_lock: 4
> >
> > AFAICS the generated code looks OK (all load/store exclusive done
> > with 32 bit size):
> >
> > e6c: 885ffc01 ldaxr w1, [x0]
> > e70: 11000821 add w1, w1, #0x2
> > e74: 8802fc01 stlxr w2, w1, [x0]
> >
> > ...but if I bump notify_me size to uint64_t the issue goes away.
>
> Ouch. :) Is this with or without my patch(es)?
>
> Also, what if you just add a dummy uint32_t after notify_me?

With the dummy the testcase also runs fine for 500 iterations.

Dann, can you try if this works on the Hi1620 too?

--Jan

Revision history for this message
Jan Glauber (jan-glauber-i) wrote :

On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote:
> On 11/10/19 08:05, Jan Glauber wrote:
> > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> >>> ...but if I bump notify_me size to uint64_t the issue goes away.
> >>
> >> Ouch. :) Is this with or without my patch(es)?
>
> You didn't answer this question.

Oh, sorry... I did but the mail probably didn't make it out.
I have both of your changes applied (as I think they make sense).

> >> Also, what if you just add a dummy uint32_t after notify_me?
> >
> > With the dummy the testcase also runs fine for 500 iterations.
>
> You might be lucky and causing list_lock to be in another cache line.
> What if you add __attribute__((aligned(16)) to notify_me (and keep the
> dummy)?

Good point. I'll try to force both into the same cacheline.

--Jan

> Paolo
>
> > Dann, can you try if this works on the Hi1620 too?

Revision history for this message
dann frazier (dannf) wrote :

On Fri, Oct 11, 2019 at 06:05:25AM +0000, Jan Glauber wrote:
> On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> > On 09/10/19 10:02, Jan Glauber wrote:
>
> > > I'm still not sure what the actual issue is here, but could it be some bad
> > > interaction between the notify_me and the list_lock? The are both 4 byte
> > > and side-by-side:
> > >
> > > address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4
> > > address list_lock: 0xaaaadb528aa4 sizeof list_lock: 4
> > >
> > > AFAICS the generated code looks OK (all load/store exclusive done
> > > with 32 bit size):
> > >
> > > e6c: 885ffc01 ldaxr w1, [x0]
> > > e70: 11000821 add w1, w1, #0x2
> > > e74: 8802fc01 stlxr w2, w1, [x0]
> > >
> > > ...but if I bump notify_me size to uint64_t the issue goes away.
> >
> > Ouch. :) Is this with or without my patch(es)?
> >
> > Also, what if you just add a dummy uint32_t after notify_me?
>
> With the dummy the testcase also runs fine for 500 iterations.
>
> Dann, can you try if this works on the Hi1620 too?

On Hi1620, it hung on the first iteration. Here's the complete patch
I'm running with:

diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..e6fd6f1a1a 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,7 +82,7 @@ struct AioContext {
      * Instead, the aio_poll calls include both the prepare and the
      * dispatch phase, hence a simple counter is enough for them.
      */
- uint32_t notify_me;
+ uint64_t notify_me;

     /* A lock to protect between QEMUBH and AioHandler adders and deleter,
      * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/async.c b/util/async.c
index ca83e32c7f..024c4c567d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source)
     aio_notify_accept(ctx);

     for (bh = ctx->first_bh; bh; bh = bh->next) {
- if (bh->scheduled) {
+ if (atomic_mb_read(&bh->scheduled)) {
             return true;
         }
     }
@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)

 void aio_notify(AioContext *ctx)
 {
- /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
+ /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
+ * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
+ * atomic_add in aio_poll.
      */
- smp_mb();
- if (ctx->notify_me) {
- event_notifier_set(&ctx->notifier);
+ if (atomic_mb_read(&ctx->notify_me)) {
+ event_notifier_set(&ctx->notifier);
         atomic_mb_set(&ctx->notified, true);
     }
 }

Revision history for this message
dann frazier (dannf) wrote :

On Fri, Oct 11, 2019 at 08:30:02AM +0000, Jan Glauber wrote:
> On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote:
> > On 11/10/19 08:05, Jan Glauber wrote:
> > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> > >>> ...but if I bump notify_me size to uint64_t the issue goes away.
> > >>
> > >> Ouch. :) Is this with or without my patch(es)?
> >
> > You didn't answer this question.
>
> Oh, sorry... I did but the mail probably didn't make it out.
> I have both of your changes applied (as I think they make sense).
>
> > >> Also, what if you just add a dummy uint32_t after notify_me?
> > >
> > > With the dummy the testcase also runs fine for 500 iterations.
> >
> > You might be lucky and causing list_lock to be in another cache line.
> > What if you add __attribute__((aligned(16)) to notify_me (and keep the
> > dummy)?
>
> Good point. I'll try to force both into the same cacheline.

On the Hi1620, this still hangs in the first iteration:

diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..00e56a5412 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,7 +82,7 @@ struct AioContext {
      * Instead, the aio_poll calls include both the prepare and the
      * dispatch phase, hence a simple counter is enough for them.
      */
- uint32_t notify_me;
+ __attribute__((aligned(16))) uint64_t notify_me;

     /* A lock to protect between QEMUBH and AioHandler adders and deleter,
      * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/async.c b/util/async.c
index ca83e32c7f..024c4c567d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source)
     aio_notify_accept(ctx);

     for (bh = ctx->first_bh; bh; bh = bh->next) {
- if (bh->scheduled) {
+ if (atomic_mb_read(&bh->scheduled)) {
             return true;
         }
     }
@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)

 void aio_notify(AioContext *ctx)
 {
- /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
+ /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
+ * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
+ * atomic_add in aio_poll.
      */
- smp_mb();
- if (ctx->notify_me) {
- event_notifier_set(&ctx->notifier);
+ if (atomic_mb_read(&ctx->notify_me)) {
+ event_notifier_set(&ctx->notifier);
         atomic_mb_set(&ctx->notified, true);
     }
 }

dann frazier (dannf)
Changed in kunpeng920:
status: New → Confirmed
Changed in qemu (Ubuntu Bionic):
status: New → Confirmed
Changed in qemu (Ubuntu Disco):
status: New → Confirmed
Changed in qemu (Ubuntu Focal):
status: New → Confirmed
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

 include/block/aio.h | 3 +++
 qemu-img.c | 4 ++++
 util/async.c | 5 +----
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index e9bc04c..9153d87 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -89,6 +89,9 @@ struct AioContext {
      */
     uint32_t notify_me;

+ /* force to notify for qemu-img convert */
+ bool notify_for_convert;
+
     /* lock to protect between bh's adders and deleter */
     QemuMutex bh_lock;

diff --git a/qemu-img.c b/qemu-img.c
index 60a2be3..cf037aa 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2411,6 +2411,10 @@ static int img_convert(int argc, char **argv)
         .wr_in_order = wr_in_order,
         .num_coroutines = num_coroutines,
     };
+
+ AioContext *ctx = qemu_get_aio_context();
+ ctx->notify_for_convert = 1;
+
     ret = convert_do_copy(&state);

 out:
diff --git a/util/async.c b/util/async.c
index 042bf8a..af235fc 100644
--- a/util/async.c
+++ b/util/async.c
@@ -336,12 +336,9 @@ void aio_notify(AioContext *ctx)
      * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
      */
     smp_mb();
- if (ctx->notify_me) {
+ if (ctx->notify_me || ctx->notify_for_convert) {
         event_notifier_set(&ctx->notifier);
         atomic_mb_set(&ctx->notified, true);
-#if defined(__aarch64__)
- kill(getpid(), SIGIO);
-#endif
     }
 }

Can you try this aboving patchset to slove it?

Revision history for this message
dann frazier (dannf) wrote :

I tested the patch in Comment #34, and it was able to pass 500 iterations.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Fred,

Based on Dann's feedback on testing, I'm failing to see where your patch fixes the "root" cause (despite being able to mitigate the issue by changing the aio notification mechanism).

I think the root cause is best described in this 2 emails from the thread:

https://lore.kernel.org/qemu-devel/20191009080220.GA2905@hc/

and

https://<email address hidden>/

So, by adding ctx->notify_for_convert, it is very likely you workarounded the issue by doing what Jan already said: removing both variables (ctx->list_lock and, in old case, ctx->notify_me, in your case, ctx->notify_for_convert) from the same cacheline and making the issue to "disappear" (as we would eventually do in a workaround patch).

What about aarch64 issue with both, ctx->list_lock and ctx->notify_for_convert, being synchronized by qemu used primitives, and being in the same cache line ?

Any "workaround" here would try to dodge the same cacheline situation, but, for upstream, I suppose Paolo wants to have something else regarding aarch64 ATOMIC_SEQ_CST.

like describe in this part of the discussion:

https://<email address hidden>/

Unless I'm missing something, am I ?

Thank you!

Revision history for this message
iveskim (iveskim) wrote :

I tested the patch in Comment #34, and it was also failed to pass 5 iterations.
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 3987
[New LWP 3988]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffffbd2b3154 in __GI_ppoll (fds=0xaaaae80ef080, nfds=187650381360708,
    timeout=<optimized out>, sigmask=0xffffc31815f0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:39
39 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
(gdb)

Revision history for this message
dann frazier (dannf) wrote :

fyi, what I tested in Comment #35 was upstream QEMU (@ aceeaa69d2) with a port of the patch in Comment #34 applied. I've attached that patch here. While it did avoid the issue in my testing, I agree with Rafael's Comment #36 that it does not appear to address the root cause (as I understand it), and is therefore unlikely something we'd ship in Ubuntu.

tags: added: patch
Ike Panhc (ikepanhc)
tags: added: ikeradar
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Could HiSilicon respond to Dann & Rafael's comments #36 and #38?
Is there an upstream acceptable patch that addresses this issue?

Changed in kunpeng920:
status: Confirmed → Incomplete
Changed in qemu (Ubuntu Bionic):
status: Confirmed → Incomplete
Changed in qemu (Ubuntu Disco):
status: Confirmed → Incomplete
Changed in qemu (Ubuntu Eoan):
status: In Progress → Incomplete
Changed in qemu (Ubuntu Focal):
status: Confirmed → Incomplete
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

=》Could HiSilicon respond to Dann & Rafael's comments #36 and #38?
=》Is there an upstream acceptable patch that addresses this issue?

No upstream patchset, I Only provide a private solution and do not know this root cause.

Changed in qemu (Ubuntu Eoan):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in qemu:
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Revision history for this message
Ike Panhc (ikepanhc) wrote :

PPA created with temporarily workaround in comment #34.

https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256

This PPA can solve temporarily but is not acceptable for offical release.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Take several CPUs offline and re-test. Even only 32 threads left, I still can reproduce this issue easily.

ubuntu@kreiken:~$ lscpu | grep list;for i in `seq 1 10`;do echo ;rm -f out.img;timeout 30 qemu-img convert -f qcow2 -O qcow2 ./bionic-server-cloudimg-arm64.img out.img -p; done
On-line CPU(s) list: 0-31
Off-line CPU(s) list: 32-127

    (100.00/100%)

    (43.20/100%)
    (0.00/100%)
    (1.00/100%)

Revision history for this message
Ying Fang (ying-fang) wrote :

Hi, Ike.

I think this tricky bug was fixed by Paolo last month.
Please try patch https://git.qemu.org/?p=qemu.git;a=commitdiff;h=5710a3e09f9b85801e5ce70797a4a511e5fc9e2c.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Thanks. I will test it.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

The test deb has been pushed to https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256

40 run with patch mentioned in #43 and all passed.

Thanks.

Changed in kunpeng920:
status: Incomplete → Triaged
assignee: nobody → Ike Panhc (ikepanhc)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Ike,

Please, let me know if you want me to go after the needed SRUs for this fix or if you will.

I'll wait for the final feedback from tests with your PPA.

Cheers!

Revision history for this message
dann frazier (dannf) wrote :

fyi, I backported that fix also to focal/groovy and eoan, and with those builds. On my test systems the hang reliable occurs within 20 iterations. After the fix, they have survived > 500 iterations thus far. I'll leave running overnight just to be sure.

Changed in qemu (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in qemu:
status: In Progress → Fix Released
Changed in qemu (Ubuntu Focal):
status: Incomplete → In Progress
Changed in qemu (Ubuntu Eoan):
status: Incomplete → In Progress
Changed in qemu (Ubuntu Disco):
status: Incomplete → In Progress
Changed in qemu (Ubuntu Bionic):
status: Incomplete → In Progress
Changed in qemu (Ubuntu):
status: Incomplete → In Progress
description: updated
dann frazier (dannf)
description: updated
Revision history for this message
Philippe Mathieu-Daudé (philmd) wrote :

Isn't this fixed by commit 5710a3e09f9?

commit 5710a3e09f9b85801e5ce70797a4a511e5fc9e2c
Author: Paolo Bonzini <email address hidden>
Date: Tue Apr 7 10:07:46 2020 -0400

    async: use explicit memory barriers

    When using C11 atomics, non-seqcst reads and writes do not participate
    in the total order of seqcst operations. In util/async.c and util/aio-posix.c,
    in particular, the pattern that we use

              write ctx->notify_me write bh->scheduled
              read bh->scheduled read ctx->notify_me
              if !bh->scheduled, sleep if ctx->notify_me, notify

    needs to use seqcst operations for both the write and the read. In
    general this is something that we do not want, because there can be
    many sources that are polled in addition to bottom halves. The
    alternative is to place a seqcst memory barrier between the write
    and the read. This also comes with a disadvantage, in that the
    memory barrier is implicit on strongly-ordered architectures and
    it wastes a few dozen clock cycles.

    Fortunately, ctx->notify_me is never written concurrently by two
    threads, so we can assert that and relax the writes to ctx->notify_me.
    The resulting solution works and performs well on both aarch64 and x86.

    Note that the atomic_set/atomic_read combination is not an atomic
    read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED;
    on x86, ATOMIC_RELAXED compiles to a locked operation.

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images

On Wed, May 6, 2020 at 1:20 PM Philippe Mathieu-Daudé
<email address hidden> wrote:
>
> Isn't this fixed by commit 5710a3e09f9?

See comment #43. The discussions hence are about testing/integration
of that fix.

  -dann

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

FYIO, from now on all the "merge" work will be done in the merge requests being linked to this BUG (at the top). @paelzer will be verifying those.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Tested debs in ppa:rafaeldtinoco/lp1805256 for focal and eoan and 1000 qemu-img convert passed.

Revision history for this message
dann frazier (dannf) wrote :

Ike's backport in https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256 tests well for me on Cavium Sabre. One minor note is that the function in_aio_context_home_thread() is being called in aio-win32.c, but that function didn't exist in 2.11. We probably want to change that to aio_context_in_iothread(). It was renamed in https://git.qemu.org/?p=qemu.git;a=commitdiff;h=d2b63ba8dd20c1091b3f1033e6a95ef95b18149d

Changed in kunpeng920:
status: Triaged → In Progress
no longer affects: qemu (Ubuntu Disco)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: sponsored into groovy

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.2-3ubuntu8

---------------
qemu (1:4.2-3ubuntu8) groovy; urgency=medium

  * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts
    - async: use explicit memory barriers (LP: #1805256)
    - aio-wait: delegate polling of main AioContext if BQL not held

 -- Rafael David Tinoco <email address hidden> Wed, 27 May 2020 21:47:21 +0000

Changed in qemu (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Migrated right now, sponsoring the related SRU portions into B/E/F ... for consideration by the SRU Team.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello dann, or anyone else affected,

Accepted qemu into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: block-proposed-focal
Changed in qemu (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Changed in qemu (Ubuntu Eoan):
status: In Progress → Fix Committed
tags: added: verification-needed-eoan
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello dann, or anyone else affected,

Accepted qemu into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.0+dfsg-0ubuntu9.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: block-proposed-eoan
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello dann, or anyone else affected,

Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.27 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
tags: added: block-proposed-bionic
removed: verification-needed-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.7)

All autopkgtests for the newly accepted qemu (1:4.0+dfsg-0ubuntu9.7) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

edk2/0~20190606.20d2e5a1-2ubuntu1.1 (amd64, armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ike Panhc (ikepanhc) wrote :

100 run on bionic/eoan/focal -proposed `qemu-img convert` all successful. No hang occurs. Thanks a lot.

tags: added: verification-done-bionic verification-done-eoan verification-done-focal
removed: verification-needed verification-needed-eoan verification-needed-focal
Changed in kunpeng920:
assignee: Ike Panhc (ikepanhc) → nobody
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:4.2-3ubuntu6.2)

All autopkgtests for the newly accepted qemu (1:4.2-3ubuntu6.2) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/245.4-4ubuntu3.1 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've looked and retried the tests - all green now.
Let us give it a few extra days in proposed as planned, but other than that it looks ok to be released.

Changed in kunpeng920:
status: In Progress → Fix Committed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We had the 14 (instead f 7) days in -proposed for some extended maturing. Nothing came up in regard to this and all validations were good.
Dropping block-proposed to be released once the SRU Team gets to it.

tags: removed: block-proposed-bionic block-proposed-eoan block-proposed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.2-3ubuntu6.2

---------------
qemu (1:4.2-3ubuntu6.2) focal; urgency=medium

  * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts
    - async: use explicit memory barriers (LP: #1805256)
    - aio-wait: delegate polling of main AioContext if BQL not held

 -- Rafael David Tinoco <email address hidden> Wed, 27 May 2020 21:19:20 +0000

Changed in qemu (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.0+dfsg-0ubuntu9.7

---------------
qemu (1:4.0+dfsg-0ubuntu9.7) eoan; urgency=medium

  * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts
    - async: use explicit memory barriers (LP: #1805256)
    - aio-wait: delegate polling of main AioContext if BQL not held

 -- Rafael David Tinoco <email address hidden> Wed, 27 May 2020 20:07:57 +0000

Changed in qemu (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.27

---------------
qemu (1:2.11+dfsg-1ubuntu7.27) bionic; urgency=medium

  * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts
    - aio: rename aio_context_in_iothread() to in_aio_context_home_thread()
    - aio: Do aio_notify_accept only during blocking aio_poll
    - aio-posix: Assert that aio_poll() is always called in home thread
    - async: use explicit memory barriers (LP: #1805256)
    - aio-wait: delegate polling of main AioContext if BQL not held
    - aio-posix: Don't count ctx->notifier as progress when polling

 -- Rafael David Tinoco <email address hidden> Tue, 26 May 2020 17:39:21 +0000

Changed in qemu (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in kunpeng920:
status: Fix Committed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This will re-open again for Bionic due to bug 1885419 forcing a revert of the former backports.
After a deeper evaluation if the assert is wrong in the backport or just flagging a problem formerly already existing in Bionic this will be re-fixed.

Changed in qemu (Ubuntu Bionic):
status: Fix Released → Triaged
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Re-open for bionic due to regression found

Changed in kunpeng920:
status: Fix Released → Triaged
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Started working on this again...

Changed in qemu (Ubuntu Bionic):
status: Triaged → In Progress
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Worked being done for the Bionic SRU:

BUG: https://bugs.launchpad.net/qemu/+bug/1805256
(fix for the bionic regression demonstrated at LP: #1885419)
PPA: https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1805256-bionic
MERGE: https://tinyurl.com/y8sucs6x

Merge proposal currently going under review, tests and discussions.

Revision history for this message
dann frazier (dannf) wrote :

I ran the new PPA build (1:2.11+dfsg-1ubuntu7.29~ppa01) on both a ThunderX2 system and a Hi1620 system overnight, and both survived (6574 & 12919 iterations, respectively).

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thanks @dannf! I spoke to Christian and him and I agreed to confine this change into ARM builds only (as SRU for Bionic). Preparing it...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I've hidden last post as it was posted in the wrong bug.

description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I just pushed/uploaded a SRU for bionic from:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/387269

Waiting for SRU on it.

Changed in qemu (Ubuntu Bionic):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello dann, or anyone else affected,

Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.30 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
removed: verification-done-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.30)

All autopkgtests for the newly accepted qemu (1:2.11+dfsg-1ubuntu7.30) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/237-3ubuntu10.42 (i386)
ubuntu-image/1.9+18.04ubuntu1 (arm64, armhf, i386, s390x, ppc64el, amd64)
cinder/2:12.0.9-0ubuntu1.2 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
dann frazier (dannf) wrote :

Verified w/ over 500 successful iterations on a m6g.metal instance, and over 300 in an armhf chroot on the same.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
dann frazier (dannf)
Changed in kunpeng920:
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.0 KiB)

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.31

---------------
qemu (1:2.11+dfsg-1ubuntu7.31) bionic-security; urgency=medium

  * SECURITY UPDATE: out-of-bounds read in slirp networking
    - debian/patches/CVE-2020-10756.patch: drop bogus IPv6 messages in
      slirp/src/ip6_input.c.
    - CVE-2020-10756
  * SECURITY UPDATE: out-of-bounds read and write in sm501
    - debian/patches/CVE-2020-12829-pre1.patch: use values from the pitch
      register for 2D operations.
    - debian/patches/CVE-2020-12829-pre2.patch: implement negated
      destination raster operation mode.
    - debian/patches/CVE-2020-12829-pre3.patch: log unimplemented raster
      operation modes.
    - debian/patches/CVE-2020-12829-pre4.patch: fix support for non-zero
      frame buffer start address.
    - debian/patches/CVE-2020-12829-pre5.patch: set updated region dirty
      after 2D operation.
    - debian/patches/CVE-2020-12829-pre6.patch: adjust endianness of pixel
      value in rectangle fill.
    - debian/patches/CVE-2020-12829-pre7.patch: convert printf +
      abort to qemu_log_mask.
    - debian/patches/CVE-2020-12829-pre8.patch: shorten long
      variable names in sm501_2d_operation.
    - debian/patches/CVE-2020-12829-pre9.patch: use BIT(x) macro to
      shorten constant.
    - debian/patches/CVE-2020-12829-pre10.patch: clean up local
      variables in sm501_2d_operation.
    - debian/patches/CVE-2020-12829.patch: replace hand written
      implementation with pixman where possible.
    - debian/patches/CVE-2020-12829-2.patch: optimize small overlapping
      blits.
    - debian/patches/CVE-2020-12829-3.patch: fix bounds checks.
    - debian/patches/CVE-2020-12829-4.patch: drop unneded variable.
    - debian/patches/CVE-2020-12829-5.patch: do not allow guest to set
      invalid format.
    - debian/patches/CVE-2020-12829-6.patch: introduce variable for
      commonly used value for better readability.
    - debian/patches/CVE-2020-12829-7.patch: fix and optimize overlap
      check.
    - CVE-2020-12829
  * SECURITY UPDATE: out-of-bounds read during sdhci_write() operations
    - debian/patches/CVE-2020-13253.patch: do not switch to ReceivingData
      if address is invalid in hw/sd/sd.c.
    - CVE-2020-13253
  * SECURITY UPDATE: out-of-bounds access during es1370_write() operation
    - debian/patches/CVE-2020-13361.patch: check total frame count against
      current frame in hw/audio/es1370.c.
    - CVE-2020-13361
  * SECURITY UPDATE: out-of-bounds read via crafted reply_queue_head
    - debian/patches/CVE-2020-13362-1.patch: use unsigned type for
      reply_queue_head and check index in hw/scsi/megasas.c.
    - debian/patches/CVE-2020-13362-2.patch: avoid NULL pointer dereference
      in hw/scsi/megasas.c.
    - debian/patches/CVE-2020-13362-3.patch: use unsigned type for positive
      numeric fields in hw/scsi/megasas.c.
    - CVE-2020-13362
  * SECURITY UPDATE: NULL pointer dereference related to BounceBuffer
    - debian/patches/CVE-2020-13659.patch: set map length to zero when
      returning NULL in exec.c, include/exec/memory.h.
    - CVE-2020-13659
  * SECURITY UPDATE: out-of-bounds access via msi-x mmio operation
    ...

Read more...

Changed in qemu (Ubuntu Bionic):
status: Fix Committed → Fix Released
no longer affects: kunpeng920/ubuntu-18.04-hwe
Changed in kunpeng920:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.