aboutsummaryrefslogtreecommitdiff
path: root/nbd
AgeCommit message (Collapse)Author
2024-03-18nbd/server: Fix race in draining the exportKevin Wolf
When draining an NBD export, nbd_drained_begin() first sets client->quiescing so that nbd_client_receive_next_request() won't start any new request coroutines. Then nbd_drained_poll() tries to makes sure that we wait for any existing request coroutines by checking that client->nb_requests has become 0. However, there is a small window between creating a new request coroutine and increasing client->nb_requests. If a coroutine is in this state, it won't be waited for and drain returns too early. In the context of switching to a different AioContext, this means that blk_aio_attached() will see client->recv_coroutine != NULL and fail its assertion. Fix this by increasing client->nb_requests immediately when starting the coroutine. Doing this after the checks if we should create a new coroutine is okay because client->lock is held. Cc: qemu-stable@nongnu.org Fixes: fd6afc501a01 ("nbd/server: Use drained block ops to quiesce the server") Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240314165825.40261-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21nbd/server: introduce NBDClient->lock to protect fieldsStefan Hajnoczi
NBDClient has a number of fields that are accessed by both the export AioContext and the main loop thread. When the AioContext lock is removed these fields will need another form of protection. Add NBDClient->lock and protect fields that are accessed by both threads. Also add assertions where possible and otherwise add doc comments stating assumptions about which thread and lock holding. Note this patch moves the client->recv_coroutine assertion from nbd_co_receive_request() to nbd_trip() where client->lock is held. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231221192452.1785567-7-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21nbd/server: only traverse NBDExport->clients from main loop threadStefan Hajnoczi
The NBD clients list is currently accessed from both the export AioContext and the main loop thread. When the AioContext lock is removed there will be nothing protecting the clients list. Adding a lock around the clients list is tricky because NBDClient structs are refcounted and may be freed from the export AioContext or the main loop thread. nbd_export_request_shutdown() -> client_close() -> nbd_client_put() is also tricky because the list lock would be held while indirectly dropping references to NDBClients. A simpler approach is to only allow nbd_client_put() and client_close() calls from the main loop thread. Then the NBD clients list is only accessed from the main loop thread and no fancy locking is needed. nbd_trip() just needs to reschedule itself in the main loop AioContext before calling nbd_client_put() and client_close(). This costs more CPU cycles per NBD request so add nbd_client_put_nonzero() to optimize the common case where more references to NBDClient remain. Note that nbd_client_get() can still be called from either thread, so make NBDClient->refcount atomic. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231221192452.1785567-6-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21nbd/server: avoid per-NBDRequest nbd_client_get/put()Stefan Hajnoczi
nbd_trip() processes a single NBD request from start to finish and holds an NBDClient reference throughout. NBDRequest does not outlive the scope of nbd_trip(). Therefore it is unnecessary to ref/unref NBDClient for each NBDRequest. Removing these nbd_client_get()/nbd_client_put() calls will make thread-safety easier in the commits that follow. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20231221192452.1785567-5-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07block: Mark bdrv_filter_or_cow_bs() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_filter_or_cow_bs() need to hold a reader lock for the graph because it calls bdrv_filter_or_cow_child(), which accesses bs->file/backing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231027155333.420094-7-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-10-05nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUSEric Blake
Allow a client to request a subset of negotiated meta contexts. For example, a client may ask to use a single connection to learn about both block status and dirty bitmaps, but where the dirty bitmap queries only need to be performed on a subset of the disk; forcing the server to compute that information on block status queries in the rest of the disk is wasted effort (both at the server, and on the amount of traffic sent over the wire to be parsed and ignored by the client). Qemu as an NBD client never requests to use more than one meta context, so it has no need to use block status payloads. Testing this instead requires support from libnbd, which CAN access multiple meta contexts in parallel from a single NBD connection; an interop test submitted to the libnbd project at the same time as this patch demonstrates the feature working, as well as testing some corner cases (for example, when the payload length is longer than the export length), although other corner cases (like passing the same id duplicated) requires a protocol fuzzer because libnbd is not wired up to break the protocol that badly. This also includes tweaks to 'qemu-nbd --list' to show when a server is advertising the capability, and to the testsuite to reflect the addition to that output. Of note: qemu will always advertise the new feature bit during NBD_OPT_INFO if extended headers have alreay been negotiated (regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has occurred); but for NBD_OPT_GO, qemu only advertises the feature if block status is also enabled (that is, if the client does not negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so the feature is not advertised). Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230925192229.3186470-26-eblake@redhat.com> [eblake: fix logic to reject unnegotiated contexts] Signed-off-by: Eric Blake <eblake@redhat.com>
2023-10-05nbd/server: Prepare for per-request filtering of BLOCK_STATUSEric Blake
The next commit will add support for the optional extension NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can request that the server only return a subset of negotiated contexts, rather than all contexts. To make that task easier, this patch populates the list of contexts to return on a per-command basis (for now, identical to the full set of negotiated contexts). Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230925192229.3186470-25-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-10-05nbd/server: Refactor list of negotiated meta contextsEric Blake
Peform several minor refactorings of how the list of negotiated meta contexts is managed, to make upcoming patches easier: Promote the internal type NBDExportMetaContexts to the public opaque type NBDMetaContexts, and mark exp const. Use a shorter member name in NBDClient. Hoist calls to nbd_check_meta_context() earlier in their callers, as the number of negotiated contexts may impact the flags exposed in regards to an export, which in turn requires a new parameter. Drop a redundant parameter to nbd_negotiate_meta_queries. No semantic change intended on the success path; on the failure path, dropping context in nbd_check_meta_export even when reporting an error is safer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-24-eblake@redhat.com>
2023-10-05nbd/client: Request extended headers during negotiationEric Blake
All the pieces are in place for a client to finally request extended headers. Note that we must not request extended headers when qemu-nbd is used to connect to the kernel module (as nbd.ko does not expect them, but expects us to do the negotiation in userspace before handing the socket over to the kernel), but there is no harm in all other clients requesting them. Extended headers are not essential to the information collected during 'qemu-nbd --list', but probing for it gives us one more piece of information in that output. Update the iotests affected by the new line of output. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-23-eblake@redhat.com>
2023-10-05nbd/client: Initial support for extended headersEric Blake
Update the client code to be able to send an extended request, and parse an extended header from the server. Note that since we reject any structured reply with a too-large payload, we can always normalize a valid header back into the compact form, so that the caller need not deal with two branches of a union. Still, until a later patch lets the client negotiate extended headers, the code added here should not be reached. Note that because of the different magic numbers, it is just as easy to trace and then tolerate a non-compliant server sending the wrong header reply as it would be to insist that the server is compliant. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230925192229.3186470-21-eblake@redhat.com> [eblake: fix trace format] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-10-05nbd/server: Enable initial support for extended headersEric Blake
Time to start supporting clients that request extended headers. Now we can finally reach the code added across several previous patches. Even though the NBD spec has been altered to allow us to accept NBD_CMD_READ larger than the max payload size (provided our response is a hole or broken up over more than one data chunk), we are not planning to take advantage of that, and continue to cap NBD_CMD_READ to 32M regardless of header size. For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already supports 64-bit operations without any effort on our part. For NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous patch took care of implementing the required NBD_REPLY_TYPE_BLOCK_STATUS_EXT. We do not yet support clients that want to do request payload filtering of NBD_CMD_BLOCK_STATUS; that will be added in later patches, but is not essential for qemu as a client since qemu only requests the single context base:allocation. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-19-eblake@redhat.com>
2023-10-05nbd/server: Support 64-bit block statusEric Blake
The NBD spec states that if the client negotiates extended headers, the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if the reply does not need more than 32 bits. As of this patch, client->mode is still never NBD_MODE_EXTENDED, so the code added here does not take effect until the next patch enables negotiation. For now, all metacontexts that we know how to export never populate more than 32 bits of information, so we don't have to worry about NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we always send all zeroes for the upper 32 bits of status during NBD_CMD_BLOCK_STATUS. Note that we previously had some interesting size-juggling on call chains, such as: nbd_co_send_block_status(uint32_t length) -> blockstatus_to_extents(uint32_t bytes) -> bdrv_block_status_above(bytes, &uint64_t num) -> nbd_extent_array_add(uint64_t num) -> store num in 32-bit length But we were lucky that it never overflowed: bdrv_block_status_above never sets num larger than bytes, and we had previously been capping 'bytes' at 32 bits (since the protocol does not allow sending a larger request without extended headers). This patch adds some assertions that ensure we continue to avoid overflowing 32 bits for a narrow client, while fully utilizing 64-bits all the way through when the client understands that. Even in 64-bit math, overflow is not an issue, because all lengths are coming from the block layer, and we know that the block layer does not support images larger than off_t (if lengths were coming from the network, the story would be different). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-18-eblake@redhat.com>
2023-10-05nbd/server: Prepare to send extended header repliesEric Blake
Although extended mode is not yet enabled, once we do turn it on, we need to reply with extended headers to all messages. Update the low level entry points necessary so that all other callers automatically get the right header based on the current mode. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-17-eblake@redhat.com>
2023-10-05nbd/server: Prepare to receive extended header requestsEric Blake
Although extended mode is not yet enabled, once we do turn it on, we need to accept extended requests for all messages. Previous patches have already taken care of supporting 64-bit lengths, now we just need to read it off the wire. Note that this implementation will block indefinitely on a buggy client that sends a non-extended payload (that is, we try to read a full packet before we ever check the magic number, but a client that mistakenly sends a simple request after negotiating extended headers doesn't send us enough bytes), but it's no different from any other client that stops talking to us partway through a packet and thus not worth coding around. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230925192229.3186470-16-eblake@redhat.com>
2023-10-05nbd/server: Support a request payloadEric Blake
Upcoming additions to support NBD 64-bit effect lengths allow for the possibility to distinguish between payload length (capped at 32M) and effect length (64 bits, although we generally assume 63 bits because of off_t limitations). Without that extension, only the NBD_CMD_WRITE request has a payload; but with the extension, it makes sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length in a future patch (where the payload is a limited-size struct that in turn gives the real effect length as well as a subset of known ids for which status is requested). Other future NBD commands may also have a request payload, so the 64-bit extension introduces a new NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header length is a payload length or an effect length, rather than hard-coding the decision based on the command. According to the spec, a client should never send a command with a payload without the negotiation phase proving such extension is available. So in the unlikely event the bit is set or cleared incorrectly, the client is already at fault; if the client then provides the payload, we can gracefully consume it off the wire and fail the command with NBD_EINVAL (subsequent checks for magic numbers ensure we are still in sync), while if the client fails to send payload we block waiting for it (basically deadlocking our connection to the bad client, but not negatively impacting our ability to service other clients, so not a security risk). Note that we do not support the payload version of BLOCK_STATUS yet. This patch also fixes a latent bug introduced in b2578459: once request->len can be 64 bits, assigning it to a 32-bit payload_len can cause wraparound to 0 which then sets req->complete prematurely; thankfully, the bug was not possible back then (it takes this and later patches to even allow request->len larger than 32 bits; and since previously the only 'payload_len = request->len' assignment was in NBD_CMD_WRITE which also sets check_length, which in turn rejects lengths larger than 32M before relying on any possibly-truncated value stored in payload_len). Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230925192229.3186470-15-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> [eblake: enhance comment on handling client error, fix type bug] Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-25nbd/server: Refactor handling of command sanity checksEric Blake
Upcoming additions to support NBD 64-bit effect lengths will add a new command flag NBD_CMD_FLAG_PAYLOAD_LEN that needs to be considered in our sanity checks of the client's messages (that is, more than just CMD_WRITE have the potential to carry a client payload when extended headers are in effect). But before we can start to support that, it is easier to first refactor the existing set of various if statements over open-coded combinations of request->type to instead be a single switch statement over all command types that sets witnesses, then straight-line processing based on the witnesses. No semantic change is intended. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230829175826.377251-24-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-09-25nbd: Prepare for 64-bit request effect lengthsEric Blake
Widen the length field of NBDRequest to 64-bits, although we can assert that all current uses are still under 32 bits: either because of NBD_MAX_BUFFER_SIZE which is even smaller (and where size_t can still be appropriate, even on 32-bit platforms), or because nothing ever puts us into NBD_MODE_EXTENDED yet (and while future patches will allow larger transactions, the lengths in play here are still capped at 32-bit). There are no semantic changes, other than a typo fix in a couple of error messages. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230829175826.377251-23-eblake@redhat.com> [eblake: fix assertion bug in nbd_co_send_simple_reply] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-09-22nbd: Add types for extended headersEric Blake
Add the constants and structs necessary for later patches to start implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client and server, matching recent upstream nbd.git (through commit e6f3b94a934). This patch does not change any existing behavior, but merely sets the stage for upcoming patches. This patch does not change the status quo that neither the client nor server use a packed-struct representation for the request header. While most of the patch adds new types, there is also some churn for renaming the existing NBDExtent to NBDExtent32 to contrast it with NBDExtent64, which I thought was a nicer name than NBDExtentExt. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230829175826.377251-22-eblake@redhat.com>
2023-09-22nbd/client: Pass mode through to nbd_send_requestEric Blake
Once the 64-bit headers extension is enabled, the data layout we send over the wire for a client request depends on the mode negotiated with the server. Rather than adding a parameter to nbd_send_request, we can add a member to struct NBDRequest, since it already does not reflect on-wire format. Some callers initialize it directly; many others rely on a common initialization point during nbd_co_send_request(). At this point, there is no semantic change. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230829175826.377251-21-eblake@redhat.com>
2023-09-22nbd: Replace bool structured_reply with mode enumEric Blake
The upcoming patches for 64-bit extensions requires various points in the protocol to make decisions based on what was negotiated. While we could easily add a 'bool extended_headers' alongside the existing 'bool structured_reply', this does not scale well if more modes are added in the future. Better is to expose the mode enum added in the recent commit bfe04d0a7d out to a wider use in the code base. Where the code previously checked for structured_reply being set or clear, it now prefers checking for an inequality; this works because the nodes are in a continuum of increasing abilities, and allows us to touch fewer places if we ever insert other modes in the middle of the enum. There should be no semantic change in this patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230829175826.377251-20-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-09-08Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into stagingStefan Hajnoczi
trivial patches for 2023-09-08 # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmT68tMPHG1qdEB0bHMu # bXNrLnJ1AAoJEHAbT2saaT5ZbEwH/2XcX1f4KcEJbgUn0JVhGQ5GH2c2jepZlkTZ # 2dhvdEECbOPMg73hty0fyyWlyuLWdJ9cMpONfMtzmHTH8RKEOAbpn/zusyo3H+48 # 6cunyUpBqbmb7MHPchrN+JmvtvaSPSazsj2Zdkh+Y4WlfEYj+yVysQ4zQlBlRyHv # iOTi6OdjxXg1QcbtJxAUhp+tKaRJzagiCpLkoyW2m8DIuV9cLVHMJsE3OMgfKNgK # /S+O1fLcaDhuSCrHAbZzArF3Tr4bfLqSwDtGCJfQpqKeIQDJuI+41GLIlm1nYY70 # IFJzEWMOrX/rcMG1CQnUFZOOyDSO+NfILwNnU+eyM49MUekmY54= # =mmPS # -----END PGP SIGNATURE----- # gpg: Signature made Fri 08 Sep 2023 06:09:23 EDT # gpg: using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59 # gpg: issuer "mjt@tls.msk.ru" # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" [full] # gpg: aka "Michael Tokarev <mjt@corpit.ru>" [full] # gpg: aka "Michael Tokarev <mjt@debian.org>" [full] # Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 # Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59 * tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu: (22 commits) qxl: don't assert() if device isn't yet initialized hw/net/vmxnet3: Fix guest-triggerable assert() tests/qtest/usb-hcd: Remove the empty "init" tests target/ppc: use g_free() in test_opcode_table() hw/ppc: use g_free() in spapr_tce_table_post_load() trivial: Simplify the spots that use TARGET_BIG_ENDIAN as a numeric value accel/tcg: Fix typo in translator_io_start() description tests/qtest/test-hmp: Fix migrate_set_parameter xbzrle-cache-size test docs tests: Fix use of migrate_set_parameter qemu-options.hx: Rephrase the descriptions of the -hd* and -cdrom options hw/display/xlnx_dp: update comments block: spelling fixes misc/other: spelling fixes qga/: spelling fixes tests/: spelling fixes scripts/: spelling fixes include/: spelling fixes audio: spelling fixes xen: spelling fix riscv: spelling fixes ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-08misc/other: spelling fixesMichael Tokarev
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Eric Blake <eblake@redhat.com>
2023-09-07io: follow coroutine AioContext in qio_channel_yield()Stefan Hajnoczi
The ongoing QEMU multi-queue block layer effort makes it possible for multiple threads to process I/O in parallel. The nbd block driver is not compatible with the multi-queue block layer yet because QIOChannel cannot be used easily from coroutines running in multiple threads. This series changes the QIOChannel API to make that possible. In the current API, calling qio_channel_attach_aio_context() sets the AioContext where qio_channel_yield() installs an fd handler prior to yielding: qio_channel_attach_aio_context(ioc, my_ctx); ... qio_channel_yield(ioc); // my_ctx is used here ... qio_channel_detach_aio_context(ioc); This API design has limitations: reading and writing must be done in the same AioContext and moving between AioContexts involves a cumbersome sequence of API calls that is not suitable for doing on a per-request basis. There is no fundamental reason why a QIOChannel needs to run within the same AioContext every time qio_channel_yield() is called. QIOChannel only uses the AioContext while inside qio_channel_yield(). The rest of the time, QIOChannel is independent of any AioContext. In the new API, qio_channel_yield() queries the AioContext from the current coroutine using qemu_coroutine_get_aio_context(). There is no need to explicitly attach/detach AioContexts anymore and qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone. One coroutine can read from the QIOChannel while another coroutine writes from a different AioContext. This API change allows the nbd block driver to use QIOChannel from any thread. It's important to keep in mind that the block driver already synchronizes QIOChannel access and ensures that two coroutines never read simultaneously or write simultaneously. This patch updates all users of qio_channel_attach_aio_context() to the new API. Most conversions are simple, but vhost-user-server requires a new qemu_coroutine_yield() call to quiesce the vu_client_trip() coroutine when not attached to any AioContext. While the API is has become simpler, there is one wart: QIOChannel has a special case for the iohandler AioContext (used for handlers that must not run in nested event loops). I didn't find an elegant way preserve that behavior, so I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false) for opting in to the new AioContext model. By default QIOChannel uses the iohandler AioHandler. Code that formerly called qio_channel_attach_aio_context() now calls qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is created. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20230830224802.493686-5-stefanha@redhat.com> [eblake: also fix migration/rdma.c] Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-07nbd: drop unused nbd_start_negotiate() aio_context argumentStefan Hajnoczi
aio_context is always NULL, so drop it. Suggested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230830224802.493686-3-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-07nbd: drop unused nbd_receive_negotiate() aio_context argumentStefan Hajnoczi
aio_context is always NULL, so drop it. Suggested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230830224802.493686-2-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2023-07-19nbd: Use enum for various negotiation modesEric Blake
Deciphering the hard-coded list of integer return values from nbd_start_negotiate() will only get more confusing when adding support for 64-bit extended headers. Better is to name things in an enum. Although the function in question is private to client.c, putting the enum in a public header and including an enum-to-string conversion will allow its use in more places in upcoming patches. The enum is intentionally laid out so that operators like <= can be used to group multiple modes with similar characteristics, and where the least powerful mode has value 0, even though this patch does not exploit that. No semantic change intended. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19nbd/client: Add safety check on chunk payload lengthEric Blake
Our existing use of structured replies either reads into a qiov capped at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see NBD_MAX_MALLOC_PAYLOAD in block/nbd.c). But the existing length checks are rather late; if we encounter a buggy (or malicious) server that sends a super-large payload length, we should drop the connection right then rather than assuming the layer on top will be careful. This becomes more important when we permit 64-bit lengths which are even more likely to have the potential for attempted denial of service abuse. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230608135653.2918540-8-eblake@redhat.com>
2023-07-19nbd: s/handle/cookie/ to match NBD specEric Blake
Externally, libnbd exposed the 64-bit opaque marker for each client NBD packet as the "cookie", because it was less confusing when contrasted with 'struct nbd_handle *' holding all libnbd state. It also avoids confusion between the noun 'handle' as a way to identify a packet and the verb 'handle' for reacting to things like signals. Upstream NBD changed their spec to favor the name "cookie" based on libnbd's recommendations[1], so we can do likewise. [1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-6-eblake@redhat.com> [eblake: typo fix] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19nbd/server: Refactor to pass full request aroundEric Blake
Part of NBD's 64-bit headers extension involves passing the client's requested offset back as part of the reply header (one reason it stated for this change: converting absolute offsets stored in NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is easier if the absolute offset of the buffer is also available). This is a refactoring patch to pass the full request around the reply stack, rather than just the handle, so that later patches can then access request->from when extended headers are active. Meanwhile, this patch enables us to now assert that simple replies are only attempted when appropriate, and otherwise has no semantic change. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230608135653.2918540-5-eblake@redhat.com>
2023-07-19nbd/server: Prepare for alternate-size headersEric Blake
Upstream NBD now documents[1] an extension that supports 64-bit effect lengths in requests. As part of that extension, the size of the reply headers will change in order to permit a 64-bit length in the reply for symmetry[2]. Additionally, where the reply header is currently 16 bytes for simple reply, and 20 bytes for structured reply; with the extension enabled, there will only be one extended reply header, of 32 bytes, with both structured and extended modes sending identical payloads for chunked replies. Since we are already wired up to use iovecs, it is easiest to allow for this change in header size by splitting each structured reply across multiple iovecs, one for the header (which will become wider in a future patch according to client negotiation), and the other(s) for the chunk payload, and removing the header from the payload struct definitions. Rename the affected functions with s/structured/chunk/ to make it obvious that the code will be reused in extended mode. Interestingly, the client side code never utilized the packed types, so only the server code needs to be updated. [1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md as of NBD commit e6f3b94a934 [2] Note that on the surface, this is because some future server might permit a 4G+ NBD_CMD_READ and need to reply with that much data in one transaction. But even though the extended reply length is widened to 64 bits, for now the NBD spec is clear that servers will not reply with more than a maximum payload bounded by the 32-bit NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually agree to transactions larger than 4G would require yet another extension. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19nbd/client: Use smarter assertEric Blake
Assigning strlen() to a uint32_t and then asserting that it isn't too large doesn't catch the case of an input string 4G in length. Thankfully, the incoming strings can never be that large: if the export name or query is reflecting a string the client got from the server, we already guarantee that we dropped the NBD connection if the server sent more than 32M in a single reply to our NBD_OPT_* request; if the export name is coming from qemu, nbd_receive_negotiate() asserted that strlen(info->name) <= NBD_MAX_STRING_SIZE; and similarly, a query string via x->dirty_bitmap coming from the user was bounds-checked in either qemu-nbd or by the limitations of QMP. Still, it doesn't hurt to be more explicit in how we write our assertions to not have to analyze whether inadvertent wraparound is possible. Fixes: 93676c88 ("nbd: Don't send oversize strings", v4.2.0) Reported-by: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-ID: <20230608135653.2918540-2-eblake@redhat.com>
2023-06-05bulk: Remove pointless QOM castsPhilippe Mathieu-Daudé
Mechanical change running Coccinelle spatch with content generated from the qom-cast-macro-clean-cocci-gen.py added in the previous commit. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230601093452.38972-3-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-05-19nbd/server: Fix drained_poll to wake coroutine in right AioContextKevin Wolf
nbd_drained_poll() generally runs in the main thread, not whatever iothread the NBD server coroutine is meant to run in, so it can't directly reenter the coroutines to wake them up. The code seems to have the right intention, it specifies the correct AioContext when it calls qemu_aio_coroutine_enter(). However, this functions doesn't schedule the coroutine to run in that AioContext, but it assumes it is already called in the home thread of the AioContext. To fix this, add a new thread-safe qio_channel_wake_read() that can be called in the main thread to wake up the coroutine in its AioContext, and use this in nbd_drained_poll(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230517152834.277483-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-25nbd: mark more coroutine_fns, do not use co_wrappersPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-20nbd: a BlockExport always has a BlockBackendPaolo Bonzini
exp->common.blk cannot be NULL, nbd_export_delete() is only called (through a bottom half) from blk_exp_unref() and in turn that can only happen after blk_exp_add() has asserted exp->blk != NULL. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-04nbd/server: Request TCP_NODELAYEric Blake
Nagle's algorithm adds latency in order to reduce network packet overhead on small packets. But when we are already using corking to merge smaller packets into transactional requests, the extra delay from TCP defaults just gets in the way (see recent commit bd2cd4a4). For reference, qemu as an NBD client already requests TCP_NODELAY (see nbd_connect() in nbd/client-connection.c); as does libnbd as a client [1], and nbdkit as a server [2]. Furthermore, the NBD spec recommends the use of TCP_NODELAY [3]. [1] https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39 [2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430 [3] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases CC: Florian Westphal <fw@strlen.de> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20230404004047.142086-1-eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-03-27nbd/server: push pending frames after sending replyFlorian Westphal
qemu-nbd doesn't set TCP_NODELAY on the tcp socket. Kernel waits for more data and avoids transmission of small packets. Without TLS this is barely noticeable, but with TLS this really shows. Booting a VM via qemu-nbd on localhost (with tls) takes more than 2 minutes on my system. tcpdump shows frequent wait periods, where no packets get sent for a 40ms period. Add explicit (un)corking when processing (and responding to) requests. "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. VM Boot time: main: no tls: 23s, with tls: 2m45s patched: no tls: 14s, with tls: 15s VM Boot time, qemu-nbd via network (same lan): main: no tls: 18s, with tls: 1m50s patched: no tls: 17s, with tls: 18s Future optimization: if we could detect if there is another pending request we could defer the uncork operation because more data would be appended. Signed-off-by: Florian Westphal <fw@strlen.de> Message-Id: <20230324104720.2498-1-fw@strlen.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-01-20include/block: Untangle inclusion loopsMarkus Armbruster
We have two inclusion loops: block/block.h -> block/block-global-state.h -> block/block-common.h -> block/blockjob.h -> block/block.h block/block.h -> block/block-io.h -> block/block-common.h -> block/blockjob.h -> block/block.h I believe these go back to Emanuele's reorganization of the block API, merged a few months ago in commit d7e2fe4aac8. Fortunately, breaking them is merely a matter of deleting unnecessary includes from headers, and adding them back in places where they are now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
2023-01-20coroutine: Split qemu/coroutine-core.h off qemu/coroutine.hMarkus Armbruster
qemu/coroutine.h and qemu/lockable.h include each other. They need each other only in macro expansions, so we could simply drop both inclusions to break the loop, and add suitable includes to files that expand the macros. Instead, move a part of qemu/coroutine.h to new qemu/coroutine-core.h so that qemu/coroutine-core.h doesn't need qemu/lockable.h, and qemu/lockable.h only needs qemu/coroutine-core.h. Result: qemu/coroutine.h includes qemu/lockable.h includes qemu/coroutine-core.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221131435.3851212-5-armbru@redhat.com> [Semantic rebase conflict with 7c10cb38cc "accel/tcg: Add debuginfo support" resolved]
2023-01-19coroutine: Clean up superfluous inclusion of qemu/coroutine.hMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20221221131435.3851212-2-armbru@redhat.com>
2022-12-15block-backend: replace bdrv_*_above with blk_*_aboveEmanuele Giuseppe Esposito
Avoid mixing bdrv_* functions with blk_*, so create blk_* counterparts for bdrv_block_status_above and bdrv_is_allocated_above. Note that since blk_co_block_status_above only calls the g_c_w function bdrv_common_block_status_above and is marked as coroutine_fn, call directly bdrv_co_common_block_status_above() to avoid using a g_c_w. Same applies to blk_co_is_allocated_above. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-5-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-12-15nbd/server.c: add coroutine_fn annotationsEmanuele Giuseppe Esposito
These functions end up calling bdrv_*() implemented as generated_co_wrapper functions. In addition, they also happen to be always called in coroutine context, meaning all callers are coroutine_fn. This means that the g_c_w function will enter the qemu_in_coroutine() case and eventually suspend (or in other words call qemu_coroutine_yield()). Therefore we can mark such functions coroutine_fn too. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20221128142337.657646-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-12-13nbd/server: Clean up abuse of BlockExportOptionsNbd member @argMarkus Armbruster
block-export-add argument @name defaults to the value of argument @node-name. nbd_export_create() implements this by copying @node_name to @name. It leaves @has_node_name false, violating the "has_node_name == !!node_name" invariant. Unclean. Falls apart when we elide @has_node_name (next commit): then QAPI frees the same value twice, once for @node_name and once @name. iotest 307 duly explodes. Goes back to commit c62d24e906 "blockdev-nbd: Boxed argument type for nbd-server-add" (v5.0.0). Got moved from qmp_nbd_server_add() to nbd_export_create() (commit 56ee86261e), then copied back (commit b6076afcab). Commit 8675cbd68b "nbd: Utilize QAPI_CLONE for type conversion" (v5.2.0) cleaned up the copy in qmp_nbd_server_add() noting Second, our assignment to arg->name is fishy: the generated QAPI code for qapi_free_NbdServerAddOptions does not visit arg->name if arg->has_name is false, but if it DID visit it, we would have introduced a double-free situation when arg is finally freed. Exactly. However, the copy in nbd_export_create() remained dirty. Clean it up. Since the value stored in member @name is not actually used outside this function, use a local variable instead of modifying the QAPI object. Signed-off-by: Markus Armbruster <armbru@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Cc: qemu-block@nongnu.org Message-Id: <20221104160712.3005652-10-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2022-07-12block: Change blk_{pread,pwrite}() param orderAlberto Faria
Swap 'buf' and 'bytes' around for consistency with blk_co_{pread,pwrite}(), and in preparation to implement these functions using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression blk, offset, buf, bytes, flags; @@ - blk_pread(blk, offset, buf, bytes, flags) + blk_pread(blk, offset, bytes, buf, flags) @@ expression blk, offset, buf, bytes, flags; @@ - blk_pwrite(blk, offset, buf, bytes, flags) + blk_pwrite(blk, offset, bytes, buf, flags) It had no effect on hw/block/nand.c, presumably due to the #if, so that file was updated manually. Overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220705161527.1054072-4-afaria@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-12block: Add a 'flags' param to blk_pread()Alberto Faria
For consistency with other I/O functions, and in preparation to implement it using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression blk, offset, buf, bytes; @@ - blk_pread(blk, offset, buf, bytes) + blk_pread(blk, offset, buf, bytes, 0) It had no effect on hw/block/nand.c, presumably due to the #if, so that file was updated manually. Overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220705161527.1054072-3-afaria@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-06-29nbd: trace long NBD operationsDenis V. Lunev
At the moment there are 2 sources of lengthy operations if configured: * open connection, which could retry inside and * reconnect of already opened connection These operations could be quite lengthy and cumbersome to catch thus it would be quite natural to add trace points for them. This patch is based on the original downstream work made by Vladimir. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Eric Blake <eblake@redhat.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2022-05-12nbd/server: Allow MULTI_CONN for shared writable exportsEric Blake
According to the NBD spec, a server that advertises NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will not see any cache inconsistencies: when properly separated by a single flush, actions performed by one client will be visible to another client, regardless of which client did the flush. We always satisfy these conditions in qemu - even when we support multiple clients, ALL clients go through a single point of reference into the block layer, with no local caching. The effect of one client is instantly visible to the next client. Even if our backend were a network device, we argue that any multi-path caching effects that would cause inconsistencies in back-to-back actions not seeing the effect of previous actions would be a bug in that backend, and not the fault of caching in qemu. As such, it is safe to unconditionally advertise CAN_MULTI_CONN for any qemu NBD server situation that supports parallel clients. Note, however, that we don't want to advertise CAN_MULTI_CONN when we know that a second client cannot connect (for historical reasons, qemu-nbd defaults to a single connection while nbd-server-add and QMP commands default to unlimited connections; but we already have existing means to let either style of NBD server creation alter those defaults). This is visible by no longer advertising MULTI_CONN for 'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation. The harder part of this patch is setting up an iotest to demonstrate behavior of multiple NBD clients to a single server. It might be possible with parallel qemu-io processes, but I found it easier to do in python with the help of libnbd, and help from Nir and Vladimir in writing the test. Signed-off-by: Eric Blake <eblake@redhat.com> Suggested-by: Nir Soffer <nsoffer@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Message-Id: <20220512004924.417153-3-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-04-26qapi: nbd-export: allow select bitmaps by node/name pairVladimir Sementsov-Ogievskiy
Hi all! Current logic of relying on search through backing chain is not safe neither convenient. Sometimes it leads to necessity of extra bitmap copying. Also, we are going to add "snapshot-access" driver, to access some snapshot state through NBD. And this driver is not formally a filter, and of course it's not a COW format driver. So, searching through backing chain will not work. Instead of widening the workaround of bitmap searching, let's extend the interface so that user can select bitmap precisely. Note, that checking for bitmap active status is not copied to the new API, I don't see a reason for it, user should understand the risks. And anyway, bitmap from other node is unrelated to this export being read-only or read-write. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru> [eblake: Adjust S-o-b to Vladimir's new email, with permission] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-22Remove trailing ; after G_DEFINE_AUTO macroMarc-André Lureau
The macro doesn't need it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-22Replace GCC_FMT_ATTR with G_GNUC_PRINTFMarc-André Lureau
One less qemu-specific macro. It also helps to make some headers/units only depend on glib, and thus moved in standalone projects eventually. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>