aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-02-12target-arm: Add AArch32 guest support to KVM64el1_aarch32.v6Greg Bellows
Add 32-bit to/from 64-bit register synchronization on register gets and puts. Set EL1_32BIT feature flag passed to KVM Signed-off-by: Greg Bellows <greg.bellows@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> --- v7 -> v8 - Fix dynamic cast object v4 -> v5 - Fix target check v3 -> v4 - Add check that to make sure KVM64 is only being used on AArch64 family of machines. - Relocate register sync to follow register fetches. - Refresh env->aarch64 prior to use. v2 -> v3 - Conditionalize sync of 32-bit and 64-bit registers
2015-02-12target-arm: Add 32/64-bit register syncGreg Bellows
Add AArch32 to AArch64 register sychronization functions. Replace manual register synchronization with new functions in aarch64_cpu_do_interrupt() and HELPER(exception_return)(). Signed-off-by: Greg Bellows <greg.bellows@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> --- v6 -> v7 -Fix comment issues v5 -> v6 - Add r14 set in 32_to_64 - Reorder conditionals in 64_to_32 fomr reg order to xreg order v4 -> v5 - Rework sync routines a bit more. v3 -> v4 - Rework sync routines to cover various exception levels - Move sync routines to helper.c v2 -> v3 - Conditionalize interrupt handler update of aarch64.
2015-02-12target-arm: Add feature parsing to virtGreg Bellows
Added machvirt parsing of feature keywords added to the -cpu command line option. Parsing occurs during machine initialization. Signed-off-by: Greg Bellows <greg.bellows@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> --- v3 -> v4 - Fix misspelling v1 -> v2 - Fix multiple property handling
2015-02-12target-arm: Add CPU property to disable AArch64Greg Bellows
Adds registration and get/set functions for enabling/disabling the AArch64 execution state on AArch64 CPUs. By default AArch64 execution state is enabled on AArch64 CPUs, setting the property to off, will disable the execution state. The below QEMU invocation would have AArch64 execution state disabled. $ ./qemu-system-aarch64 -machine virt -cpu cortex-a57,aarch64=off Also adds stripping of features from CPU model string in acquiring the ARM CPU by name. Signed-off-by: Greg Bellows <greg.bellows@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> --- v4 -> v5 - Fix error message. v3 -> v4 - Switch from using strtok to g_strsplit - Add disablement of aarch64 option if KVM is not enabled. v1 -> v2 - Scrap the custom CPU feature parsing in favor of using the default CPU parsing. - Add registration of CPU AArch64 property to disable/enable the AArch64 feature.
2015-02-09qmp: unbreak build for non-vnc configurationLeon Yu
Signed-off-by: Leon Yu <chianglungyu@gmail.com> Message-id: 1422853731-5282-1-git-send-email-chianglungyu@gmail.com Fixes: df887684603a ("monitor: add query-vnc-servers command") Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-02-06Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block patches for 2.3 # gpg: Signature made Fri 06 Feb 2015 17:14:10 GMT using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (47 commits) block/raw-posix.c: Fix raw_getlength() on Mac OS X block devices block: Eliminate silly QERR_ macros used for encryption keys block: New bdrv_add_key(), convert monitor to use it blockdev: Eliminate silly QERR_BLOCK_JOB_NOT_ACTIVE macro blockdev: Give find_block_job() an Error ** parameter qcow2: Rewrite qcow2_alloc_bytes() block: Give always priority to unused entries in the qcow2 L2 cache nbd: fix max_discard/max_transfer_length block: introduce BDRV_REQUEST_MAX_SECTORS nbd: Improve error messages iotests: Fix 104 for NBD iotests: Fix 100 for nbd iotests: Fix 083 block: fix off-by-one error in qcow and qcow2 qemu-iotests: add 116 invalid QED input file tests qed: check for header size overflow block/dmg: improve zeroes handling block/dmg: support bzip2 block entry types block/dmg: factor out block type check block/dmg: use SectorNumber from BLKX header ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-02-06block/raw-posix.c: Fix raw_getlength() on Mac OS X block devicesProgrammingkid
This patch replaces the dummy code in raw_getlength() for block devices on OS X, which always returned LLONG_MAX, with a real implementation that returns the actual block device size. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06Merge remote-tracking branch 'mreitz/block' into queue-blockKevin Wolf
* mreitz/block: block: Eliminate silly QERR_ macros used for encryption keys block: New bdrv_add_key(), convert monitor to use it blockdev: Eliminate silly QERR_BLOCK_JOB_NOT_ACTIVE macro blockdev: Give find_block_job() an Error ** parameter
2015-02-06block: Eliminate silly QERR_ macros used for encryption keysMarkus Armbruster
The QERR_ macros are leftovers from the days of "rich" error objects. They're used with error_set() and qerror_report(), and expand into the first *two* arguments. This trickiness has become pointless. Clean up QERR_DEVICE_ENCRYPTED and QERR_DEVICE_NOT_ENCRYPTED. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-5-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06block: New bdrv_add_key(), convert monitor to use itMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06blockdev: Eliminate silly QERR_BLOCK_JOB_NOT_ACTIVE macroMarkus Armbruster
The QERR_ macros are leftovers from the days of "rich" error objects. They're used with error_set() and qerror_report(), and expand into the first *two* arguments. This trickiness has become pointless. Clean this one up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-3-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06blockdev: Give find_block_job() an Error ** parameterMarkus Armbruster
When find_block_job() fails, all its callers build the same Error object. Build it in find_block_job() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1422524221-8566-2-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06qcow2: Rewrite qcow2_alloc_bytes()Max Reitz
qcow2_alloc_bytes() is a function with insufficient error handling and an unnecessary goto. This patch rewrites it. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: Give always priority to unused entries in the qcow2 L2 cacheAlberto Garcia
The current algorithm to replace entries from the L2 cache gives priority to newer hits by dividing the hit count of all existing entries by two everytime there is a cache miss. However, if there are several cache misses the hit count of the existing entries can easily go down to 0. This will result in those entries being replaced even when there are others that have never been used. This problem is more noticeable with larger disk images and cache sizes, since the chances of having several misses before the cache is full are higher. If we make sure that the hit count can never go down to 0 again, unused entries will always have priority. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06nbd: fix max_discard/max_transfer_lengthDenis V. Lunev
nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Lieven <pl@kamp.de> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: introduce BDRV_REQUEST_MAX_SECTORSPeter Lieven
we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06nbd: Improve error messagesMax Reitz
This patch makes use of the Error object for nbd_receive_negotiate() so that errors during negotiation look nicer. Furthermore, this patch adds an additional error message if the received magic was wrong, but would be correct for the other protocol version, respectively: So if an export name was specified, but the NBD server magic corresponds to an old handshake, this condition is explicitly signaled to the user, and vice versa. As these messages are now part of the "Could not open image" error message, additional filtering has to be employed in iotest 083, which this patch does as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06iotests: Fix 104 for NBDMax Reitz
_make_test_img sets up an NBD server, _cleanup_test_img shuts it down; thus, _cleanup_test_img has to be called before _make_test_img is invoked another time. Furthermore, the pipe through _filter_test_img was unnecessary; _make_test_img already takes care of that. And finally, a filter is added to _filter_img_info to replace "nbd://127.0.0.1:10810" by "TEST_DIR/t.IMGFMT", since the former is the way to express the full image path (normally the latter) for NBD tests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06iotests: Fix 100 for nbdMax Reitz
In case of NBD, _make_test_img starts a new NBD server. Therefore, _cleanup_test_img (which shuts that server down) has to be invoked before the next _make_test_img call in order to make 100 work for NBD. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06iotests: Fix 083Max Reitz
As of 8f9e835fd2e687d2bfe936819c3494af4343614d, probing should be disabled in the qemu-iotests (at least when using qemu-io). This broke 083's reference output (which consisted mostly of "Could not read image for determining its format"). This patch fixes it. Note that one case which failed before is now successful: Disconnect after data. This is due to qemu having read twice before (once for probing, once for the qemu-io read command), but only once now (the qemu-io read command). Therefore, reading is successful (which is correct). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: fix off-by-one error in qcow and qcow2Jeff Cody
This fixes an off-by-one error introduced in 9a29e18. Both qcow and qcow2 need to make sure to leave room for string terminator '\0' for the backing file, so the max length of the non-terminated string is either 1023 or PATH_MAX - 1. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06qemu-iotests: add 116 invalid QED input file testsStefan Hajnoczi
These tests exercise error code paths in the QED image format. The tests are very simple, they just prove that the error path exits cleanly. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1421065893-18875-3-git-send-email-stefanha@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06qed: check for header size overflowStefan Hajnoczi
Header size is denoted in clusters. The maximum cluster size is 64 MB but there is no limit on header size. Check for uint32_t overflow in case the header size field has a whacky value. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1421065893-18875-2-git-send-email-stefanha@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: improve zeroes handlingPeter Wu
Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB is seen in the real world). These blocks (type 2) do not need to be extracted into a temporary buffer, there is no need to allocate memory for these blocks nor to check its length. (For the test image, the maximum uncompressed size is 1054371 bytes, probably for a bzip2-compressed block.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-13-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: support bzip2 block entry typesPeter Wu
This patch adds support for bzip2-compressed block entries as introduced with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image). It was tested against a 5.2G "OS X Yosemite" installation image which stores the BLXX block in the XML property list (instead of resource forks) and has over 5k chunks. New configure entries are added (--enable-bzip2 / --disable-bzip2) to control inclusion of bzip2 functionality (which requires linking against libbz2). The help message suggests that this option is needed for DMG files, but the tests are generic enough that other parts of QEMU can use bzip2 if needed. The identifiers are based on http://newosxbook.com/DMG.html. The decompression routines are based on the zlib case, but as there is no way to reset the decompression state (unlike zlib), memory is allocated and deallocated for every decompression. This should not be problematic as the decompression takes most of the time and as blocks are typically about/over 1 MiB in size, only one allocation is done every 2000 sectors. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1420566495-13284-12-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: factor out block type checkPeter Wu
In preparation for adding bzip2 support, split the type check into a separate function. Make all offsets relative to the begin of a chunk such that it is easier to recognize the position without having to add up all offsets. Some comments are added to describe the fields. There is no functional change. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-11-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: use SectorNumber from BLKX headerPeter Wu
Previously the sector table parsing relied on the previous offset of the DMG file. Now it uses the sector number from the BLKX header (see http://newosxbook.com/DMG.html). The implementation of dmg2img (from vu1tur) does not base the output sector on the location of the terminator (0xffffffff) either so it should be safe to drop this dependency on the previous state. (It makes somehow makes sense, a terminator should halt further processing of a block and is perhaps used to preallocate some space.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-10-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: fix sector data offset calculationPeter Wu
This patch addresses two issues: - The data fork offset was not taken into account, resulting in failure to read an InstallESD.dmg file (5164763151 bytes) which had a non-zero DataForkOffset field. - The offset of the previous block ("partition") was unconditionally added to the current block because older files would start the input offset of a new block at zero. Newer files (including vlc-2.1.5.dmg, tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in reads because these files have chunk offsets, relative to the begin of a data fork. Now the data offset of the mish is taken into account. While we could check that the data_offset is within the data fork, let's not do that here as it would only result in parse failures on invalid files (rather than gracefully handling such bad files). dmg_read will error out if the offset is incorrect. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-9-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: set virtual size to a non-zero valuePeter Wu
Right now the virtual size is always reported as zero which makes it impossible to convert between formats. After this patch, the number of sectors will be read from the trailer ("koly" block). To verify the behavior, the output of `dmg2img foo.dmg foo.img` was compared against `qemu-img convert -f dmg -O raw foo.dmg foo.raw`. The tests showed that the file contents are exactly the same, except that QEMU creates a slightly larger file (it matches the total sectors count). Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-8-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: process XML plistsPeter Wu
The format is simple enough to avoid using a full-blown XML parser. It assumes that all BLKX items begin with the "mish" magic word, therefore it is not a problem if other values get matched which are not a BLKX block. The offsets are based on the description at http://newosxbook.com/DMG.html For compatibility with glib 2.12, use g_base64_decode (which additionally requires an extra buffer allocation) instead of g_base64_decode_inplace (which is only available since glib 2.20). Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-7-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: validate chunk size to avoid overflowPeter Wu
Previously the chunk size was not checked, allowing for a large memory allocation. This patch checks whether the chunks size is within the resource fork length, and whether the resource fork is below the trailer of the dmg file. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-6-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: process a buffer instead of reading intsPeter Wu
As the decoded plist XML is not a pointer in the file, dmg_read_mish_block must be able to process a buffer instead of a file pointer. Since the full buffer must be processed, let's change the return value again to just a success flag. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-5-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: extract processing of resource forksPeter Wu
Besides the offset, also read the resource length. This length is now used in the extracted function to verify the end of the resource fork against "count" from the resource fork. Instead of relying on the value of offset to conclude whether the resource fork is available or not (info_begin==0), check the rsrc_fork_length instead. This would allow a dmg file to begin with a resource fork. This seemingly unnecessary restriction was found while trying to craft a DMG file by hand. Other changes: - Do not require resource data offset to be 0x100 (but check that it is within bounds though). - Further improve boundary checking (resource data must be within the resource fork). - Use correct value for resource data length (spotted by John Snow) - Consider the resource data offset when determining info_end. This fixes an EINVAL on the tuxpaint dmg example. The resource fork format is documented at https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151 Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-4-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: extract mish block decoding functionalityPeter Wu
Extract the mish block decoder such that this can be used for other formats in the future. A new DmgHeaderState struct is introduced to share state while decoding. The code is kept unchanged as much as possible, a "fail" label is added for example where a simple return would probably do. In dmg_open, the variable "tmp" is renamed to "rsrc_data_offset" for clarity and comments have been added explaining various data. Note that this patch has one subtle difference with the previous version which should not affect functionality. In the previous code, the end of a resource was inferred from the mish block (the offsets would be increased by the fields). In this patch, the resource length is used instead to avoid the need to rely on the previous offsets. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1420566495-13284-3-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: properly detect the UDIF trailerPeter Wu
DMG files have a variable length with a UDIF trailer at the end of a file. This UDIF trailer is essential as it describes the contents of the image. At the moment however, the start of this trailer is almost always incorrect as bdrv_getlength() returns a multiple of the block size (rounded up). This results in a failure to recognize DMG files, resulting in Invalid argument (EINVAL) errors. As there is no API to retrieve the real file size, look for the magic header in the last two sectors to find the start of this 512-byte UDIF trailer (the "koly" block). The resource fork offset ("info_begin") has its offset adjusted as the initial value of offset does not mean "end of file" anymore, but "begin of UDIF trailer". [Replaced error_set(errp, ERROR_CLASS_GENERIC_ERROR, ...) with error_setg(errp, ...) as discussed with Peter. --Stefan] Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1420566495-13284-2-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: add event when disk usage exceeds thresholdFrancesco Romani
Managing applications, like oVirt (http://www.ovirt.org), make extensive use of thin-provisioned disk images. To let the guest run smoothly and be not unnecessarily paused, oVirt sets a disk usage threshold (so called 'high water mark') based on the occupation of the device, and automatically extends the image once the threshold is reached or exceeded. In order to detect the crossing of the threshold, oVirt has no choice but aggressively polling the QEMU monitor using the query-blockstats command. This lead to unnecessary system load, and is made even worse under scale: deployments with hundreds of VMs are no longer rare. To fix this, this patch adds: * A new monitor command `block-set-write-threshold', to set a mark for a given block device. * A new event `BLOCK_WRITE_THRESHOLD', to report if a block device usage exceeds the threshold. * A new `write_threshold' field into the `BlockDeviceInfo' structure, to report the configured threshold. This will allow the managing application to use smarter and more efficient monitoring, greatly reducing the need of polling. [Updated qemu-iotests 067 output to add the new 'write_threshold' property. --Stefan] [Changed g_assert_false() to !g_assert() to fix the build on older glib versions. --Kevin] Signed-off-by: Francesco Romani <fromani@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06iotests: Specify format for qemu-nbdMax Reitz
This patch is necessary to suppress the "probed raw" warning when running raw over nbd tests. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06qemu-iotests: Fix supported_oses checkFam Zheng
There is a bug in the recently added sys.platform test, and we no longer run python tests, because "linux2" is the value to compare here. So do a prefix match. According to python doc [1], the way to use sys.platform is "unless you want to test for a specific system version, it is therefore recommended to use the following idiom": if sys.platform.startswith('freebsd'): # FreeBSD-specific code here... elif sys.platform.startswith('linux'): # Linux-specific code here... [1]: https://docs.python.org/2.7/library/sys.html#sys.platform Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06virtio-blk: add a knob to disable request mergingPeter Lieven
this adds a knob to disable request merging for debugging or benchmarks if dedired. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06virtio-blk: introduce multireadPeter Lieven
this patch finally introduces multiread support to virtio-blk. While multiwrite support was there for a long time, read support was missing. The complete merge logic is moved into virtio-blk.c which has been the only user of request merging ever since. This is required to be able to merge chunks of requests and immediately invoke callbacks for those requests. Secondly, this is required to switch to direct invocation of coroutines which is planned at a later stage. The following benchmarks show the performance of running fio with 4 worker threads on a local ram disk. The numbers show the average of 10 test runs after 1 run as warmup phase. | 4k | 64k | 4k MB/s | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand --------------+--------+---------+--------+---------+--------+-------- master | 1221 | 1187 | 4178 | 4114 | 1745 | 1213 multiread | 1829 | 1189 | 4639 | 4110 | 1894 | 1216 Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block-backend: expose bs->bl.max_transfer_lengthPeter Lieven
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06hw/virtio-blk: add a constant for max number of merged requestsPeter Lieven
As it was not obvious (at least for me) where the 32 comes from; add a constant for it. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: add accounting for merged requestsPeter Lieven
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06qed: Really remove unused field QEDAIOCB.finishedFam Zheng
The commit 533ffb17a that removed qed_aiocb_info.cancel said to remove this but didn't do it. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06qemu-img: Add QEMU_PKGVERSION to QEMU_IMG_VERSIONDon Slutz
This is the same way vl.c handles this. Signed-off-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: change default for discard and write zeroes to INT_MAXPeter Lieven
do not trim requests if the driver does not supply a limit through BlockLimits. For write zeroes we still keep a limit for the unsupported path to avoid allocating a big bounce buffer. Suggested-by: Kevin Wolf <kwolf@redhat.com> Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroesDenis V. Lunev
This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported. Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more mature. The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due to the following reasons: - FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes sparse. In order to retain original functionality we must allocate disk space afterwards. This is done using fallocate(0) call - fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing if called above already allocated areas of the file, i.e. the content will not be zeroed This should increase the performance a bit for not-so-modern kernels. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/raw-posix: call plain fallocate in handle_aiocb_write_zeroesDenis V. Lunev
There is a possibility that we are extending our image and thus writing zeroes beyond the end of the file. In this case we do not need to care about the hole to make sure that there is no data in the file under this offset (pre-condition to fallocate(0) to work). We could simply call fallocate(0). This improves the performance of writing zeroes even on really old platforms which do not have even FALLOC_FL_PUNCH_HOLE. Before the patch do_fallocate was used when either CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined. Now the story is different. CONFIG_FALLOCATE is defined when Linux fallocate is defined, posix_fallocate is completely different story (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE thus we are on the safe side. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroesDenis V. Lunev
This efficiently writes zeroes on Linux if the kernel is capable enough. FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not including file expansion. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/raw-posix: refactor handle_aiocb_write_zeroes a bitDenis V. Lunev
move code dealing with a block device to a separate function. This will allow to implement additional processing for ordinary files. Please note, that xfs_code has been moved before checking for s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside. This makes code a bit more consistent. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>