aboutsummaryrefslogtreecommitdiff
path: root/block/qed.c
AgeCommit message (Collapse)Author
2012-08-29qed: refuse unaligned zero writes with a backing fileStefan Hajnoczi
Zero writes have cluster granularity in QED. Therefore they can only be used to zero entire clusters. If the zero write request leaves sectors untouched, zeroing the entire cluster would obscure the backing file. Instead return -ENOTSUP, which is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a regular write. The qemu-iotests 034 test cases covers this scenario. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-10block: add BLOCK_O_CHECK for qemu-img checkStefan Hajnoczi
Image formats with a dirty bit, like qed and qcow2, repair dirty image files upon open with BDRV_O_RDWR. Performing automatic repair when qemu-img check runs is not ideal because the bdrv_open() call repairs the image before the actual bdrv_check() call from qemu-img.c. Fix this "double repair" since it leads to confusing output from qemu-img check. Tell the block driver that this image is being opened just for bdrv_check(). This skips automatic repair and qemu-img.c can invoke it manually with bdrv_check(). Update the golden output for qemu-iotests 039 to reflect the new qemu-img check output. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-10qed: mark image clean after repair succeedsStefan Hajnoczi
The dirty bit is cleared after image repair succeeds in qed_open(). Move this into qed_check() so that all callers benefit from this behavior when fix=true. This is necessary so qemu-img check can call .bdrv_check() and mark the image clean. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09Merge remote-tracking branch 'mjt/mjt-iov2' into stagingAnthony Liguori
* mjt/mjt-iov2: rewrite iov_send_recv() and move it to iov.c cleanup qemu_co_sendv(), qemu_co_recvv() and friends export iov_send_recv() and use it in iov_send() and iov_recv() rename qemu_sendv to iov_send, change proto and move declarations to iov.h change qemu_iovec_to_buf() to match other to,from_buf functions consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent allow qemu_iovec_from_buffer() to specify offset from which to start copying consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset() rewrite iov_* functions change iov_* function prototypes to be more appropriate virtio-serial-bus: use correct lengths in control_out() message Conflicts: tests/Makefile Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-07-09blkdebug: remove sync i/o eventsPaolo Bonzini
These are unused, except (by mistake more or less) in QED. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15qemu-img check -r for repairing imagesKevin Wolf
The QED block driver already provides the functionality to not only detect inconsistencies in images, but also fix them. However, this functionality cannot be manually invoked with qemu-img, but the check happens only automatically during bdrv_open(). This adds a -r switch to qemu-img check that allows manual invocation of an image repair. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-11consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistentMichael Tokarev
qemu_iovec_concat() is currently a wrapper for qemu_iovec_copy(), use the former (with extra "0" arg) in a few places where it is used. Change skip argument of qemu_iovec_copy() from uint64_t to size_t, since size of qiov itself is size_t, so there's no way to skip larger sizes. Rename it to soffset, to make it clear that the offset is applied to src. Also change the only usage of uint64_t in hw/9pfs/virtio-9p.c, in v9fs_init_qiov_from_pdu() - all callers of it actually uses size_t too, not uint64_t. One added restriction: as for all other iovec-related functions, soffset must point inside src. Order of argumens is already good: qemu_iovec_memset(QEMUIOVector *qiov, size_t offset, int c, size_t bytes) vs: qemu_iovec_concat(QEMUIOVector *dst, QEMUIOVector *src, size_t soffset, size_t sbytes) (note soffset is after _src_ not dst, since it applies to src; for memset it applies to qiov). Note that in many places where this function is used, the previous call is qemu_iovec_reset(), which means many callers actually want copy (replacing dst content), not concat. So we may want to add a wrapper like qemu_iovec_copy() with the same arguments but which calls qemu_iovec_reset() before _concat(). Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-06-11consolidate qemu_iovec_memset{,_skip}() into single function and use ↵Michael Tokarev
existing iov_memset() This patch combines two functions into one, and replaces the implementation with already existing iov_memset() from iov.c. The new prototype of qemu_iovec_memset(): size_t qemu_iovec_memset(qiov, size_t offset, int fillc, size_t bytes) It is different from former qemu_iovec_memset_skip(), and I want to make other functions to be consistent with it too: first how much to skip, second what, and 3rd how many of it. It also returns actual number of bytes filled in, which may be less than the requested `bytes' if qiov is smaller than offset+bytes, in the same way iov_memset() does. While at it, use utility function iov_memset() from iov.h in posix-aio-compat.c, where qiov was used. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-05-10block: fix snapshot on QEDPaolo Bonzini
QED's opaque data includes a pointer back to the BlockDriverState. This breaks when bdrv_append shuffles data between bs_new and bs_top. To avoid this, add a "rebind" function that tells the driver about the new relationship between the BlockDriverState and its opaque. The patch also adds rebind to VVFAT for completeness, even though it is not used with live snapshots. Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: remove incoming live migration blockerBenoît Canet
Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: honor BDRV_O_INCOMING for incoming live migrationBenoît Canet
From original commit with Patchwork-id: 31108 by Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> "The QED image format includes a file header bit to mark images dirty. QED normally checks dirty images on open and fixes inconsistent metadata. This is undesirable during live migration since the dirty bit may be set if the source host is modifying the image file. The check should be postponed until migration completes. Skip operations that modify the image file if the BDRV_O_INCOMING flag is set." Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: add bdrv_invalidate_cache to be called after incoming live migrationBenoît Canet
The QED image is reopened to flush metadata and check consistency. Signed-off-by: Benoit Canet <benoit.canet@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05qed: track dirty flag statusDong Xu Wang
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05block: push recursive flushing up from driversPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09qed: add .bdrv_co_write_zeroes() supportStefan Hajnoczi
Zero writes are a dedicated interface for writing regions of zeroes into the image file. If clusters are not yet allocated it is possible to use an efficient metadata representation which keeps the image file compact and does not store individual zero bytes. Implementing this for the QED image format is fairly straightforward. The only issue is that when a zero write touches an existing cluster we have to allocate a bounce buffer and perform a regular write. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09qed: replace is_write with flags fieldStefan Hajnoczi
Per-request attributes like read/write are currently implemented as bool fields in the QEDAIOCB struct. This becomes unwiedly as the number of attributes grows. For example, the qed_aio_setup() function would have to take multiple bool arguments and at call sites it would be hard to distinguish the meaning of each bool. Instead use a flags field with bitmask constants. This will be used when zero write support is added. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15block: bdrv_aio_* do not return NULLPaolo Bonzini
Initially done with the following semantic patch: @ rule1 @ expression E; statement S; @@ E = ( bdrv_aio_readv | bdrv_aio_writev | bdrv_aio_flush | bdrv_aio_discard | bdrv_aio_ioctl ) (...); ( - if (E == NULL) { ... } | - if (E) { <... S ...> } ) which however missed the occurrence in block/blkverify.c (as it should have done), and left behind some unused variables. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05qed: convert to .bdrv_co_is_allocated()Stefan Hajnoczi
The bdrv_qed_is_allocated() function is a synchronous wrapper around qed_find_cluster(), which performs the cluster lookup. In order to convert the synchronous function to a coroutine function we yield instead of using qemu_aio_wait(). Note that QED's cache is already safe for parallel requests so no locking is needed. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-21qed: add migration blocker (v2)Anthony Liguori
Now when you try to migrate with qed, you get: (qemu) migrate tcp:localhost:1025 Block format 'qed' used by device 'ide0-hd0' does not support feature 'live migration' (qemu) Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-10-31Merge remote-tracking branch 'stefanha/trivial-patches' into stagingAnthony Liguori
2011-10-26qed: remove unneeded variable assignmentPavel Borzenkov
'ret' is unconditionally overwitten by qed_read_l1_table_sync() Spotted by Clang Analyzer Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-10-26qed: don't pass NULL to memcpyPavel Borzenkov
Spotted by Clang Analyzer [Note this memcpy call has always been safe because the length will be 0 when the pointer is NULL] Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-10-21block: drop redundant bdrv_flush implementationStefan Hajnoczi
Block drivers now only need to provide either of .bdrv_co_flush, .bdrv_aio_flush() or for legacy drivers .bdrv_flush(). Remove the redundant .bdrv_flush() implementations. [Paolo Bonzini: change raw driver to bdrv_co_flush] Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-05qed: fix use-after-free during l2 cache commitStefan Hajnoczi
QED's metadata caching strategy allows two parallel requests to race for metadata lookup. The first one to complete will populate the metadata cache and the second one will drop the data it just read in favor of the cached data. There is a use-after-free in qed_read_l2_table_cb() and qed_commit_l2_update() where l2_table->offset was used after the l2_table may have been freed due to a metadata lookup race. Fix this by keeping the l2_offset in a local variable and not reaching into the possibly freed l2_table. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-20Use glib memory allocation and free functionsAnthony Liguori
qemu_malloc/qemu_free no longer exist after this commit. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-02async: Remove AsyncContextKevin Wolf
The purpose of AsyncContexts was to protect qcow and qcow2 against reentrancy during an emulated bdrv_read/write (which includes a qemu_aio_wait() call and can run AIO callbacks of different requests if it weren't for AsyncContexts). Now both qcow and qcow2 are protected by CoMutexes and AsyncContexts can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-08qemu-img create: Fix displayed default cluster sizeKevin Wolf
When not specifying a cluster size on the command line, qemu-img printed a cluster size of 0: Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=0 This patch adds the default cluster size to the QEMUOptionParameter list, so that it displays the default value that is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-18qed: support for growing imagesStefan Hajnoczi
The .bdrv_truncate() operation resizes images and growing is easy to implement in QED. Simply check that the new size is valid and then update the image_size header field to reflect the new size. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-18qed: Periodically flush and clear need check bitStefan Hajnoczi
One strategy to limit the startup delay of consistency check when opening image files is to ensure that the file is marked dirty for as little time as possible. QED currently marks the image dirty when the first allocating write request is issued and clears the dirty bit again when the image is cleanly closed. In practice that means the image is marked dirty for most of a guest's lifetime and prone to being in a dirty state upon crash or power failure. It is safe to clear the dirty bit after all allocating write requests have completed and a flush has been performed. This patch adds a timer after the last allocating write request completes. When the timer fires it will flush and then clear the dirty bit. The timer is set to 5 seconds and is cancelled upon arrival of a new allocating write request. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-13qed: Add support for zero clustersAnthony Liguori
Zero clusters are similar to unallocated clusters except instead of reading their value from a backing file when one is available, the cluster is always read as zero. This implements read support only. At this stage, QED will never write a zero cluster. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-10qed: Report error for unsupported featuresKevin Wolf
Instead of just returning -ENOTSUP, generate a more detailed error. Unfortunately we don't have a helpful text for features that we don't know yet, so just print the feature mask. It might be useful at least if someone asks for help. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-01-31qed: Images with backing file do not require QED_F_NEED_CHECKStefan Hajnoczi
The consistency check on open is necessary in order to fix inconsistent table offsets left as a result of a crash mid-operation. Images with a backing file actually flush before updating table offsets and are therefore guaranteed to be consistent. Do not mark these images dirty. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24qed: Refuse to create images on block devicesStefan Hajnoczi
QED relies on the underlying filesystem to extend the file and maintain its size. Check that images are not created on a block device. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17qed: Consistency check supportStefan Hajnoczi
This patch adds support for the qemu-img check command. It also introduces a dirty bit in the qed header to mark modified images as needing a check. This bit is cleared when the image file is closed cleanly. If an image file is opened and it has the dirty bit set, a consistency check will run and try to fix corrupted table offsets. These corruptions may occur if there is power loss while an allocating write is performed. Once the image is fixed it opens as normal again. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17qed: Read/write supportStefan Hajnoczi
This patch implements the read/write state machine. Operations are fully asynchronous and multiple operations may be active at any time. Allocating writes lock tables to ensure metadata updates do not interfere with each other. If two allocating writes need to update the same L2 table they will run sequentially. If two allocating writes need to update different L2 tables they will run in parallel. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17qed: Table, L2 cache, and cluster functionsStefan Hajnoczi
This patch adds code to look up data cluster offsets in the image via the L1/L2 tables. The L2 tables are writethrough cached in memory for performance (each read/write requires a lookup so it is essential to cache the tables). With cluster lookup code in place it is possible to implement bdrv_is_allocated() to query the number of contiguous allocated/unallocated clusters. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17qed: Add QEMU Enhanced Disk image formatStefan Hajnoczi
This patch introduces the qed on-disk layout and implements image creation. Later patches add read/write and other functionality. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>