aboutsummaryrefslogtreecommitdiff
path: root/util/aio-posix.c
AgeCommit message (Collapse)Author
2024-01-08iothread: Remove unused Error** argument in aio_context_set_aio_paramsPhilippe Mathieu-Daudé
aio_context_set_aio_params() doesn't use its undocumented Error** argument. Remove it to simplify. Note this removes a use of "unchecked Error**" in iothread_set_aio_context_params(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231120171806.19361-1-philmd@linaro.org>
2023-05-30aio: remove aio_disable_external() APIStefan Hajnoczi
All callers now pass is_external=false to aio_set_fd_handler() and aio_set_event_notifier(). The aio_disable_external() API that temporarily disables fd handlers that were registered is_external=true is therefore dead code. Remove aio_disable_external(), aio_enable_external(), and the is_external arguments to aio_set_fd_handler() and aio_set_event_notifier(). The entire test-fdmon-epoll test is removed because its sole purpose was testing aio_disable_external(). Parts of this patch were generated using the following coccinelle (https://coccinelle.lip6.fr/) semantic patch: @@ expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque; @@ - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque) + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque) @@ expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready; @@ - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready) + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready) Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-21-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19aio-posix: do not nest poll handlersStefan Hajnoczi
QEMU's event loop supports nesting, which means that event handler functions may themselves call aio_poll(). The condition that triggered a handler must be reset before the nested aio_poll() call, otherwise the same handler will be called and immediately re-enter aio_poll. This leads to an infinite loop and stack exhaustion. Poll handlers are especially prone to this issue, because they typically reset their condition by finishing the processing of pending work. Unfortunately it is during the processing of pending work that nested aio_poll() calls typically occur and the condition has not yet been reset. Disable a poll handler during ->io_poll_ready() so that a nested aio_poll() call cannot invoke ->io_poll_ready() again. As a result, the disabled poll handler and its associated fd handler do not run during the nested aio_poll(). Calling aio_set_fd_handler() from inside nested aio_poll() could cause it to run again. If the fd handler is pending inside nested aio_poll(), then it will also run again. In theory fd handlers can be affected by the same issue, but they are more likely to reset the condition before calling nested aio_poll(). This is a special case and it's somewhat complex, but I don't see a way around it as long as nested aio_poll() is supported. Cc: qemu-stable@nongnu.org Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2186181 Fixes: c38270692593 ("block: Mark bdrv_co_io_(un)plug() and callers GRAPH_RDLOCK") Cc: Kevin Wolf <kwolf@redhat.com> Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230502184134.534703-2-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-03-13aio: make aio_set_fd_poll() static to aio-posix.cMarc-André Lureau
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-9-marcandre.lureau@redhat.com>
2023-01-23util/aio: Defer disabling poll mode as long as possibleChao Gao
When we measure FIO read performance (cache=writethrough, bs=4k, iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed from guest to qemu. It turns out those frequent notificatons are caused by interference from worker threads. Worker threads queue bottom halves after completing IO requests. Pending bottom halves may lead to either aio_compute_timeout() zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns no progress after noticing pending aio_notify() events. Both cause run_poll_handlers() to call poll_set_started(false) to disable poll mode. However, for both cases, as timeout is already zeroed, the event loop (i.e., aio_poll()) just processes bottom halves and then starts the next event loop iteration. So, disabling poll mode has no value but leads to unnecessary notifications from guest. To minimize unnecessary notifications from guest, defer disabling poll mode to when the event loop is about to be blocked. With this patch applied, FIO seq-read performance (bs=4k, iodepth=64, cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Message-id: 20220710120849.63086-1-chao.gao@intel.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09util/event-loop-base: Introduce options to set the thread pool sizeNicolas Saenz Julienne
The thread pool regulates itself: when idle, it kills threads until empty, when in demand, it creates new threads until full. This behaviour doesn't play well with latency sensitive workloads where the price of creating a new thread is too high. For example, when paired with qemu's '-mlock', or using safety features like SafeStack, creating a new thread has been measured take multiple milliseconds. In order to mitigate this let's introduce a new 'EventLoopBase' property to set the thread pool size. The threads will be created during the pool's initialization or upon updating the property's value, remain available during its lifetime regardless of demand, and destroyed upon freeing it. A properly characterized workload will then be able to configure the pool to avoid any latency spikes. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-4-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-03-17aio-posix: fix spurious ->poll_ready() callbacks in main loopStefan Hajnoczi
When ->poll() succeeds the AioHandler is placed on the ready list with revents set to the magic value 0. This magic value causes aio_dispatch_handler() to invoke ->poll_ready() instead of ->io_read() for G_IO_IN or ->io_write() for G_IO_OUT. This magic value 0 hack works for the IOThread where AioHandlers are placed on ->ready_list and processed by aio_dispatch_ready_handlers(). It does not work for the main loop where all AioHandlers are processed by aio_dispatch_handlers(), even those that are not ready and have a revents value of 0. As a result the main loop invokes ->poll_ready() on AioHandlers that are not ready. These spurious ->poll_ready() calls waste CPU cycles and could lead to crashes if the code assumes ->poll() must have succeeded before ->poll_ready() is called (a reasonable asumption but I haven't seen it in practice). Stop using revents to track whether ->poll_ready() will be called on an AioHandler. Introduce a separate AioHandler->poll_ready field instead. This eliminates spurious ->poll_ready() calls in the main loop. Fixes: 826cc32423db2a99d184dbf4f507c737d7e7a4ae ("aio-posix: split poll check from ready handler") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reported-by: Jason Wang <jasowang@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com> Message-id: 20220223155703.136833-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12aio-posix: split poll check from ready handlerStefan Hajnoczi
Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-21iothread: add aio-max-batch parameterStefano Garzarella
The `aio-max-batch` parameter will be propagated to AIO engines and it will be used to control the maximum number of queued requests. When there are in queue a number of requests equal to `aio-max-batch`, the engine invokes the system call to forward the requests to the kernel. This parameter allows us to control the maximum batch size to reduce the latency that requests might accumulate while queued in the AIO engine queue. If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its default maximum batch size value. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-09qmp: Move dispatcher to a coroutineKevin Wolf
This moves the QMP dispatcher to a coroutine and runs all QMP command handlers that declare 'coroutine': true in coroutine context so they can avoid blocking the main loop while doing I/O or waiting for other events. For commands that are not declared safe to run in a coroutine, the dispatcher drops out of coroutine context by calling the QMP command handler from a bottom half. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201005155855.256490-10-kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-09-23qemu/atomic.h: rename atomic_ to qatomic_Stefan Hajnoczi
clang's C11 atomic_fetch_*() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-08-13aio-posix: keep aio_notify_me disabled during pollingStefan Hajnoczi
Polling only monitors the ctx->notified field and does not need the ctx->notifier EventNotifier to be signalled. Keep ctx->aio_notify_me disabled while polling to avoid unnecessary EventNotifier syscalls. This optimization improves virtio-blk 4KB random read performance by 18%. The following results are with an IOThread and the null-co block driver: Test IOPS Error Before 244518.62 ± 1.20% After 290706.11 ± 0.44% Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806131802.569478-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-18aio-posix: disable fdmon-io_uring when GSource is usedStefan Hajnoczi
The glib event loop does not call fdmon_io_uring_wait() so fd handlers waiting to be submitted build up in the list. There is no benefit is using io_uring when the glib GSource is being used, so disable it instead of implementing a more complex fix. This fixes a memory leak where AioHandlers would build up and increasing amounts of CPU time were spent iterating them in aio_pending(). The symptom is that guests become slow when QEMU is built with io_uring support. Buglink: https://bugs.launchpad.net/qemu/+bug/1877716 Fixes: 73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf ("aio-posix: add io_uring fd monitoring implementation") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-18aio-posix: don't duplicate fd handler deletion in fdmon_io_uring_destroy()Stefan Hajnoczi
The io_uring file descriptor monitoring implementation has an internal list of fd handlers that are pending submission to io_uring. fdmon_io_uring_destroy() deletes all fd handlers on the list. Don't delete fd handlers directly in fdmon_io_uring_destroy() for two reasons: 1. This duplicates the aio-posix.c AioHandler deletion code and could become outdated if the struct changes. 2. Only handlers with the FDMON_IO_URING_REMOVE flag set are safe to remove. If the flag is not set then something still has a pointer to the fd handler. Let aio-posix.c and its user worry about that. In practice this isn't an issue because fdmon_io_uring_destroy() is only called when shutting down so all users have removed their fd handlers, but the next patch will need this! Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-04-09async: use explicit memory barriersPaolo Bonzini
When using C11 atomics, non-seqcst reads and writes do not participate in the total order of seqcst operations. In util/async.c and util/aio-posix.c, in particular, the pattern that we use write ctx->notify_me write bh->scheduled read bh->scheduled read ctx->notify_me if !bh->scheduled, sleep if ctx->notify_me, notify needs to use seqcst operations for both the write and the read. In general this is something that we do not want, because there can be many sources that are polled in addition to bottom halves. The alternative is to place a seqcst memory barrier between the write and the read. This also comes with a disadvantage, in that the memory barrier is implicit on strongly-ordered architectures and it wastes a few dozen clock cycles. Fortunately, ctx->notify_me is never written concurrently by two threads, so we can assert that and relax the writes to ctx->notify_me. The resulting solution works and performs well on both aarch64 and x86. Note that the atomic_set/atomic_read combination is not an atomic read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED; on x86, ATOMIC_RELAXED compiles to a locked operation. Analyzed-by: Ying Fang <fangying1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Ying Fang <fangying1@huawei.com> Message-Id: <20200407140746.8041-6-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-09aio-posix: remove idle poll handlers to improve scalabilityStefan Hajnoczi
When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
2020-03-09aio-posix: support userspace polling of fd monitoringStefan Hajnoczi
Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>
2020-03-09aio-posix: add io_uring fd monitoring implementationStefan Hajnoczi
The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>
2020-03-09aio-posix: simplify FDMonOps->update() prototypeStefan Hajnoczi
The AioHandler *node, bool is_new arguments are more complicated to think about than simply being given AioHandler *old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>
2020-03-09aio-posix: extract ppoll(2) and epoll(7) fd monitoringStefan Hajnoczi
The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>
2020-03-09aio-posix: move RCU_READ_LOCK() into run_poll_handlers()Stefan Hajnoczi
Now that run_poll_handlers_once() is only called by run_poll_handlers() we can improve the CPU time profile by moving the expensive RCU_READ_LOCK() out of the polling loop. This reduces the run_poll_handlers() from 40% CPU to 10% CPU in perf's sampling profiler output. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-3-stefanha@redhat.com Message-Id: <20200305170806.1313245-3-stefanha@redhat.com>
2020-03-09aio-posix: completely stop polling when disabledStefan Hajnoczi
One iteration of polling is always performed even when polling is disabled. This is done because: 1. Userspace polling is cheaper than making a syscall. We might get lucky. 2. We must poll once more after polling has stopped in case an event occurred while stopping polling. However, there are downsides: 1. Polling becomes a bottleneck when the number of event sources is very high. It's more efficient to monitor fds in that case. 2. A high-frequency polling event source can starve non-polling event sources because ppoll(2)/epoll(7) is never invoked. This patch removes the forced polling iteration so that poll_ns=0 really means no polling. IOPS increases from 10k to 60k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device because the large number of event sources being polled slows down the event loop. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-2-stefanha@redhat.com Message-Id: <20200305170806.1313245-2-stefanha@redhat.com>
2020-03-09aio-posix: remove confusing QLIST_SAFE_REMOVE()Stefan Hajnoczi
QLIST_SAFE_REMOVE() is confusing here because the node must be on the list. We actually just wanted to clear the linked list pointers when removing it from the list. QLIST_REMOVE() now does this, so switch to it. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200224103406.1894923-3-stefanha@redhat.com Message-Id: <20200224103406.1894923-3-stefanha@redhat.com>
2020-02-22aio-posix: make AioHandler dispatch O(1) with epollStefan Hajnoczi
File descriptor monitoring is O(1) with epoll(7), but aio_dispatch_handlers() still scans all AioHandlers instead of dispatching just those that are ready. This makes aio_poll() O(n) with respect to the total number of registered handlers. Add a local ready_list to aio_poll() so that each nested aio_poll() builds a list of handlers ready to be dispatched. Since file descriptor polling is level-triggered, nested aio_poll() calls also see fds that were ready in the parent but not yet dispatched. This guarantees that nested aio_poll() invocations will dispatch all fds, even those that became ready before the nested invocation. Since only handlers ready to be dispatched are placed onto the ready_list, the new aio_dispatch_ready_handlers() function provides O(1) dispatch. Note that AioContext polling is still O(n) and currently cannot be fully disabled. This still needs to be fixed before aio_poll() is fully O(1). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-6-stefanha@redhat.com [Fix compilation error on macOS where there is no epoll(87). The aio_epoll() prototype was out of date and aio_add_ready_list() needed to be moved outside the ifdef. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-22aio-posix: make AioHandler deletion O(1)Stefan Hajnoczi
It is not necessary to scan all AioHandlers for deletion. Keep a list of deleted handlers instead of scanning the full list of all handlers. The AioHandler->deleted field can be dropped. Let's check if the handler has been inserted into the deleted list instead. Add a new QLIST_IS_INSERTED() API for this check. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-22aio-posix: don't pass ns timeout to epoll_wait()Stefan Hajnoczi
Don't pass the nanosecond timeout into epoll_wait(), which expects milliseconds. The epoll_wait() timeout value does not matter if qemu_poll_ns() determined that the poll fd is ready, but passing a value in the wrong units is still ugly. Pass a 0 timeout to epoll_wait() instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-22aio-posix: fix use after leaving scope in aio_poll()Stefan Hajnoczi
epoll_handler is a stack variable and must not be accessed after it goes out of scope: if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) { AioHandler epoll_handler; ... add_pollfd(&epoll_handler); ret = aio_epoll(ctx, pollfds, npfd, timeout); } ... ... /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i = 0; i < npfd; i++) { nodes[i]->pfd.revents = pollfds[i].revents; } } nodes[0] is &epoll_handler, which has already gone out of scope. There is no need to use pollfds[] for epoll. We don't need an AioHandler for the epoll fd. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-22aio-posix: avoid reacquiring rcu_read_lock() when pollingStefan Hajnoczi
The first rcu_read_lock/unlock() is expensive. Nested calls are cheap. This optimization increases IOPS from 73k to 162k with a Linux guest that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200218182708.914552-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2019-05-10aio-posix: ensure poll mode is left when aio_notify is calledPaolo Bonzini
With aio=thread, adaptive polling makes latency worse rather than better, because it delays the execution of the ThreadPool's completion bottom half. event_notifier_poll() does run while polling, detecting that a bottom half was scheduled by a worker thread, but because ctx->notifier is explicitly ignored in run_poll_handlers_once(), scheduling the BH does not count as making progress and run_poll_handlers() keeps running. Fix this by recomputing the deadline after *timeout could have changed. With this change, ThreadPool still cannot participate in polling but at least it does not suffer from extra latency. Reported-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20190409122823.12416-1-pbonzini@redhat.com Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1553692145-86728-1-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20190409122823.12416-1-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-25aio-posix: Assert that aio_poll() is always called in home threadKevin Wolf
aio_poll() has an existing assertion that the function is only called from the AioContext's home thread if blocking is allowed. This is not enough, some handlers make assumptions about the thread they run in. Extend the assertion to non-blocking calls, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-01-14aio-posix: Fix concurrent aio_poll/set_fd_handler.Remy Noel
It is possible for an io_poll callback to be concurrently executed along with an aio_set_fd_handlers. This can cause all sorts of problems, like a NULL callback or a bad opaque pointer. This changes set_fd_handlers so that it no longer modify existing handlers entries and instead, always insert those after having proper initialisation. Tested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Remy Noel <remy.noel@blade-group.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20181220152030.28035-3-remy.noel@blade-group.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-14aio-posix: Unregister fd from ctx epoll when removing fd_handler.Remy Noel
Cleaning the events will cause aio_epoll_update to unregister the fd. Otherwise, the fd is kept registered until it is destroyed. Signed-off-by: Remy Noel <remy.noel@blade-group.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20181220152030.28035-2-remy.noel@blade-group.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-10-29util: aio-posix: fix a typoLi Qiang
Cc: qemu-trivial@nongnu.org Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1538964972-3223-1-git-send-email-liq3ea@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-09-26aio-posix: do skip system call if ctx->notifier polling succeedsPaolo Bonzini
Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when 2018-08-15), by not reporting progress, causes aio_poll to execute the system call when polling succeeds because of ctx->notifier. This introduces latency before the call to aio_bh_poll() and negates the advantages of polling, unfortunately. The fix builds on the previous patch, separating the effect of polling on the timeout from the progress reported to aio_poll(). ctx->notifier does zero the timeout, causing the caller to skip the system call, but it does not report progress, so that the bug fix of commit 70232b5253 still stands. Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-4-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26aio-posix: compute timeout before pollingPaolo Bonzini
This is a preparation for the next patch, and also a very small optimization. Compute the timeout only once, before invoking try_poll_mode, and adjust it in run_poll_handlers. The adjustment is the polling time when polling fails, or zero (non-blocking) if polling succeeds. Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-3-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26aio-posix: fix concurrent access to poll_disable_cntPaolo Bonzini
It is valid for an aio_set_fd_handler to happen concurrently with aio_poll. In that case, poll_disable_cnt can change under the heels of aio_poll, and the assertion on poll_disable_cnt can fail in run_poll_handlers. Therefore, this patch simply checks the counter on every polling iteration. There are no particular needs for ordering, since the polling loop is terminated anyway by aio_notify at the end of aio_set_fd_handler. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-2-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-08-15aio-posix: Improve comment around marking node deletedFam Zheng
The counter is for qemu_lockcnt_inc/dec sections (read side), qemu_lockcnt_lock/unlock is for the write side. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180803063917.30292-1-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-08-15aio: Do aio_notify_accept only during blocking aio_pollFam Zheng
An aio_notify() pairs with an aio_notify_accept(). The former should happen in the main thread or a vCPU thread, and the latter should be done in the IOThread. There is one rare case that the main thread or vCPU thread may "steal" the aio_notify() event just raised by itself, in bdrv_set_aio_context() [1]. The sequence is like this: main thread IO Thread =============================================================== bdrv_drained_begin() aio_disable_external(ctx) aio_poll(ctx, true) ctx->notify_me += 2 ... bdrv_drained_end() ... aio_notify() ... bdrv_set_aio_context() aio_poll(ctx, false) [1] aio_notify_accept(ctx) ppoll() /* Hang! */ [1] is problematic. It will clear the ctx->notifier event so that the blocked ppoll() will not return. (For the curious, this bug was noticed when booting a number of VMs simultaneously in RHV. One or two of the VMs will hit this race condition, making the VIRTIO device unresponsive to I/O commands. When it hangs, Seabios is busy waiting for a read request to complete (read MBR), right after initializing the virtio-blk-pci device, using 100% guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750 for the original bug analysis.) aio_notify() only injects an event when ctx->notify_me is set, correspondingly aio_notify_accept() is only useful when ctx->notify_me _was_ set. Move the call to it into the "blocking" branch. This will effectively skip [1] and fix the hang. Furthermore, blocking aio_poll is only allowed on home thread (in_aio_context_home_thread), because otherwise two blocking aio_poll()'s can steal each other's ctx->notifier event and cause hanging just like described above. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180809132259.18402-3-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-08-15aio-posix: Don't count ctx->notifier as progress when pollingFam Zheng
The same logic exists in fd polling. This change is especially important to avoid busy loop once we limit aio_notify_accept() to blocking aio_poll(). Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180809132259.18402-2-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-05-18iothread: fix epollfd leak in the process of delIOThreadJie Wang
When we call addIOThread, the epollfd created in aio_context_setup, but not close it in the process of delIOThread, so the epollfd will leak. Reorder the code in aio_epoll_disable and reuse it. Signed-off-by: Jie Wang <wangjie88@huawei.com> Message-Id: <1526517763-11108-1-git-send-email-wangjie88@huawei.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> [Mention change to aio_epoll_disable in commit message. - Fam] Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-10async: use ARRAY_SIZE macroPhilippe Mathieu-Daudé
Applied using the Coccinelle semantic patch scripts/coccinelle/use_osdep.cocci Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-11-06aio-posix: drop QEMU_AIO_POLL_MAX_NS env varStefan Hajnoczi
This hunk should not have been merged but I forgot to remove it. Let's remove it before it slips into a QEMU release. ¯\_(ツ)_/¯ Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20171103154041.12617-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-03aio: fix assert when remove poll during destroyStefan Hajnoczi
After iothread is enabled internally inside QEMU with GMainContext, we may encounter this warning when destroying the iothread: (qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll: assertion '!SOURCE_DESTROYED (source)' failed The problem is that g_source_remove_poll() does not allow to remove one source from array if the source is detached from its owner context. (peterx: which IMHO does not make much sense) Fix it on QEMU side by avoid calling g_source_remove_poll() if we know the object is during destruction, and we won't leak anything after all since the array will be gone soon cleanly even with that fd. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-id: 20170928025958.1420-6-peterx@redhat.com [peterx: write the commit message] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21async: remove unnecessary inc/dec pairsPaolo Bonzini
Pull the increment/decrement pair out of aio_bh_poll and into the callers. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-18-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio-posix: partially inline aio_dispatch into aio_pollPaolo Bonzini
This patch prepares for the removal of unnecessary lockcnt inc/dec pairs. Extract the dispatching loop for file descriptor handlers into a new function aio_dispatch_handlers, and then inline aio_dispatch into aio_poll. aio_dispatch can now become void. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-17-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: explicitly acquire aiocontext in callbacks that need itPaolo Bonzini
This covers both file descriptor callbacks and polling callbacks, since they execute related code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-14-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: explicitly acquire aiocontext in timers that need itPaolo Bonzini
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-13-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio: push aio_context_acquire/release down to dispatchingPaolo Bonzini
The AioContext data structures are now protected by list_lock and/or they are walked with FOREACH_RCU primitives. There is no need anymore to acquire the AioContext for the entire duration of aio_dispatch. Instead, just acquire it before and after invoking the callbacks. The next step is then to push it further down. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-12-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: move AioContext, QEMUTimer, main-loop to libqemuutilPaolo Bonzini
AioContext is fairly self contained, the only dependency is QEMUTimer but that in turn doesn't need anything else. So move them out of block-obj-y to avoid introducing a dependency from io/ to block-obj-y. main-loop and its dependency iohandler also need to be moved, because later in this series io/ will call iohandler_get_aio_context. [Changed copyright "the QEMU team" to "other QEMU contributors" as suggested by Daniel Berrange and agreed by Paolo. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>