summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2015-08-10Merge tag 'v3.18.20' into linux-linaro-lsk-v3.18Kevin Hilman
Linux 3.18.20 * tag 'v3.18.20': (52 commits) Linux 3.18.20 blk-mq: fix CPU hotplug handling SCSI: add 1024 max sectors black list flag Fix firmware loader uevent buffer NULL pointer dereference arm64: Don't report clear pmds and puds as huge 9p: don't leave a half-initialized inode sitting around 9p: forgetting to cancel request on interrupted zero-copy RPC NFS: Fix size of NFSACL SETACL operations watchdog: omap: assert the counter being stopped before reprogramming of: return NUMA_NO_NODE from fallback of_node_to_nid() USB: usbfs: allow URBs to be reaped after disconnection dell-laptop: Fix allocating & freeing SMI buffer page mac80211: prevent possible crypto tx tailroom corruption security_syslog() should be called once only __bitmap_parselist: fix bug in empty string handling mmc: card: Fixup request missing in mmc_blk_issue_rw_rq iser-target: Fix possible deadlock in RDMA_CM connection error iscsi-target: Convert iscsi_thread_set usage to kthread.h ACPICA: Tables: Fix an issue that FACS initialization is performed twice Btrfs: fix memory leak in the extent_same ioctl ...
2015-08-04Merge tag 'v3.18.19' of ↵Kevin Hilman
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into linux-linaro-lsk-v3.18 Linux 3.18.19 * tag 'v3.18.19' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (237 commits) Linux 3.18.19 x86/xen: allow privcmd hypercalls to be preempted rtnl: restore notifications for deleted interfaces Input: synaptics - add min/max quirk for Lenovo S540 Revert "Input: synaptics - add min/max quirk for Lenovo S540" Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation" fs/ufs: restore s_lock mutex_init() ufs: Fix possible deadlock when looking up directories ufs: Fix warning from unlock_new_inode() vfs: Ignore unlocked mounts in fs_fully_visible KVM: x86: properly restore LVT0 KVM: s390: virtio-ccw: don't overwrite config space values selinux: fix setting of security labels on NFS usb: gadget: f_fs: add extra check before unregister_gadget_item perf/x86: Honor the architectural performance monitoring version perf: Fix ring_buffer_attach() RCU sync, again x86/boot: Fix overflow warning with 32-bit binutils vfs: Remove incorrect debugging WARN in prepend_path KVM: x86: make vapics_in_nmi_mode atomic arm: KVM: force execution of HCPTR access on VM exit ...
2015-08-04md: fix a build warningFiro Yang
[ Upstream commit 4e023612325a9034a542bfab79f78b1fe5ebb841 ] Warning like this: drivers/md/md.c: In function "update_array_info": drivers/md/md.c:6394:26: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses] !mddev->persistent != info->not_persistent|| Fix it as Neil Brown said: mddev->persistent != !info->not_persistent || Signed-off-by: Firo Yang <firogm@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-08-04dm btree: silence lockdep lock inversion in dm_btree_del()Joe Thornber
[ Upstream commit 1c7518794a3647eb345d59ee52844e8a40405198 ] Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del() can be called via an ioctl and we don't want to recurse into the FS or block layer. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-08-04dm btree remove: fix bug in redistribute3Dennis Yang
[ Upstream commit 4c7e309340ff85072e96f529582d159002c36734 ] redistribute3() shares entries out across 3 nodes. Some entries were being moved the wrong way, breaking the ordering. This manifested as a BUG() in dm-btree-remove.c:shift() when entries were removed from the btree. For additional context see: https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-03dm stats: fix divide by zero if 'number_of_areas' arg is zeroMikulas Patocka
[ Upstream commit dd4c1b7d0c95be1c9245118a3accc41a16f1db67 ] If the number_of_areas argument was zero the kernel would crash on div-by-zero. Add better input validation. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-03dm space map metadata: fix occasional leak of a metadata block on resizeJoe Thornber
[ Upstream commit 6096d91af0b65a3967139b32d5adbb3647858a26 ] The metadata space map has a simplified 'bootstrap' mode that is operational when extending the space maps. Whilst in this mode it's possible for some refcount decrement operations to become queued (eg, as a result of shadowing one of the bitmap indexes). These decrements were not being applied when switching out of bootstrap mode. The effect of this bug was the leaking of a 4k metadata block. This is detected by the latest version of thin_check as a non fatal error. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-29dm crypt: fix missing error code return from crypt_ctr error pathWei Yongjun
Fix to return a negative error code from crypt_ctr()'s optional parameter processing error path. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit 44c144f9c8e8fbd73ede2848da8253b3aae42ec2) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: leverage immutable biovecs when decrypting on readMike Snitzer
Commit 003b5c571 ("block: Convert drivers to immutable biovecs") stopped short of changing dm-crypt to leverage the fact that the biovec array of a bio will no longer be modified. Switch to using bio_clone_fast() when cloning bios for decryption after read. Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit 5977907937afa2b5584a874d44ba6c0f56aeaa9c) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: update URLs to new cryptsetup project pageMilan Broz
Cryptsetup home page moved to GitLab. Also remove link to abandonded Truecrypt page. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit e44f23b32dc7916b2bc12817e2f723fefa21ba41) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: sort writesMikulas Patocka
Write requests are sorted in a red-black tree structure and are submitted in the sorted order. In theory the sorting should be performed by the underlying disk scheduler, however, in practice the disk scheduler only accepts and sorts a finite number of requests. To allow the sorting of all requests, dm-crypt needs to implement its own sorting. The overhead associated with rbtree-based sorting is considered negligible so it is not used conditionally. Even on SSD sorting can be beneficial since in-order request dispatch promotes lower latency IO completion to the upper layers. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit b3c5fd3052492f1b8d060799d4f18be5a5438add) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: add 'submit_from_crypt_cpus' optionMikulas Patocka
Make it possible to disable offloading writes by setting the optional 'submit_from_crypt_cpus' table argument. There are some situations where offloading write bios from the encryption threads to a single thread degrades performance significantly. The default is to offload write bios to the same thread because it benefits CFQ to have writes submitted using the same IO context. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: offload writes to threadMikulas Patocka
Submitting write bios directly in the encryption thread caused serious performance degradation. On a multiprocessor machine, encryption requests finish in a different order than they were submitted. Consequently, write requests would be submitted in a different order and it could cause severe performance degradation. Move the submission of write requests to a separate thread so that the requests can be sorted before submitting. But this commit improves dm-crypt performance even without having dm-crypt perform request sorting (in particular it enables IO schedulers like CFQ to sort more effectively). Note: it is required that a previous commit ("dm crypt: don't allocate pages for a partial request") be applied before applying this patch. Otherwise, this commit could introduce a crash. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit dc2676210c425ee8e5cb1bec5bc84d004ddf4179) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: remove unused io_pool and _crypt_io_poolMikulas Patocka
The previous commit ("dm crypt: don't allocate pages for a partial request") stopped using the io_pool slab mempool and backing _crypt_io_pool kmem cache. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit 94f5e0243c48aa01441c987743dc468e2d6eaca2) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: avoid deadlock in mempoolsMikulas Patocka
Fix a theoretical deadlock introduced in the previous commit ("dm crypt: don't allocate pages for a partial request"). The function crypt_alloc_buffer may be called concurrently. If we allocate from the mempool concurrently, there is a possibility of deadlock. For example, if we have mempool of 256 pages, two processes, each wanting 256, pages allocate from the mempool concurrently, it may deadlock in a situation where both processes have allocated 128 pages and the mempool is exhausted. To avoid such a scenario we allocate the pages under a mutex. In order to not degrade performance with excessive locking, we try non-blocking allocations without a mutex first and if that fails, we fallback to a blocking allocations with a mutex. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit 7145c241a1bf2841952c3e297c4080b357b3e52d) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: don't allocate pages for a partial requestMikulas Patocka
Change crypt_alloc_buffer so that it only ever allocates pages for a full request. This is a prerequisite for the commit "dm crypt: offload writes to thread". This change simplifies the dm-crypt code at the expense of reduced throughput in low memory conditions (where allocation for a partial request is most useful). Note: the next commit ("dm crypt: avoid deadlock in mempools") is needed to fix a theoretical deadlock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit cf2f1abfbd0dba701f7f16ef619e4d2485de3366) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-29dm crypt: use unbound workqueue for request processingMikulas Patocka
Use unbound workqueue by default so that work is automatically balanced between available CPUs. The original behavior of encrypting using the same cpu that IO was submitted on can still be enabled by setting the optional 'same_cpu_crypt' table argument. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit f3396c58fd8442850e759843457d78b6ec3a9589) Signed-off-by: Alex Shi <alex.shi@linaro.org>
2015-06-14md/raid0: fix restore to sector variable in raid0_make_requestEric Work
[ Upstream commit a81157768a00e8cf8a7b43b5ea5cac931262374f ] The variable "sector" in "raid0_make_request()" was improperly updated by a call to "sector_div()" which modifies its first argument in place. Commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd restored this variable after the call for later re-use. Unfortunetly the restore was done after the referenced variable "bio" was advanced. This lead to the original value and the restored value being different. Here we move this line to the proper place. One observed side effect of this bug was discarding a file though unlinking would cause an unrelated file's contents to be discarded. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: 47d68979cc96 ("md/raid0: fix bug with chunksize not a power of 2.") Cc: stable@vger.kernel.org (any that received above backport) URL: https://bugzilla.kernel.org/show_bug.cgi?id=98501 Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10md/raid5: don't record new size if resize_stripes fails.NeilBrown
[ Upstream commit 6e9eac2dcee5e19f125967dd2be3e36558c42fff ] If any memory allocation in resize_stripes fails we will return -ENOMEM, but in some cases we update conf->pool_size anyway. This means that if we try again, the allocations will be assumed to be larger than they are, and badness results. So only update pool_size if there is no error. This bug was introduced in 2.6.17 and the patch is suitable for -stable. Fixes: ad01c9e3752f ("[PATCH] md: Allow stripes to be expanded in preparation for expanding an array") Cc: stable@vger.kernel.org (v2.6.17+) Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-23Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"Rabin Vincent
[ Upstream commit c0403ec0bb5a8c5b267fb7e16021bec0b17e4964 ] This reverts Linux 4.1-rc1 commit 0618764cb25f6fa9fb31152995de42a8a0496475. The problem which that commit attempts to fix actually lies in the Freescale CAAM crypto driver not dm-crypt. dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto driver should internally backlog requests which arrive when the queue is full and process them later. Until the crypto hw's queue becomes full, the driver returns -EINPROGRESS. When the crypto hw's queue if full, the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is expected to backlog the request and process it when the hardware has queue space. At the point when the driver takes the request from the backlog and starts processing it, it calls the completion function with a status of -EINPROGRESS. The completion function is called (for a second time, in the case of backlogged requests) with a status/err of 0 when a request is done. Crypto drivers for hardware without hardware queueing use the helpers, crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request() and crypto_get_backlog() helpers to implement this behaviour correctly, while others implement this behaviour without these helpers (ccp, for example). dm-crypt (before the patch that needs reverting) uses this API correctly. It queues up as many requests as the hw queues will allow (i.e. as long as it gets back -EINPROGRESS from the request function). Then, when it sees at least one backlogged request (gets -EBUSY), it waits till that backlogged request is handled (completion gets called with -EINPROGRESS), and then continues. The references to af_alg_wait_for_completion() and af_alg_complete() in that commit's commit message are irrelevant because those functions only handle one request at a time, unlink dm-crypt. The problem is that the Freescale CAAM driver, which that commit describes as having being tested with, fails to implement the backlogging behaviour correctly. In cam_jr_enqueue(), if the hardware queue is full, it simply returns -EBUSY without backlogging the request. What the observed deadlock was is not described in the commit message but it is obviously the wait_for_completion() in crypto_convert() where dm-crypto would wait for the completion being called with -EINPROGRESS in the case of backlogged requests. This completion will never be completed due to the bug in the CAAM driver. Commit 0618764cb25 incorrectly made dm-crypt wait for every request, even when the driver/hardware queues are not full, which means that dm-crypt will never see -EBUSY. This means that that commit will cause a performance regression on all crypto drivers which implement the API correctly. Revert it. Correct backlog handling should be implemented in the CAAM driver instead. Cc'ing stable purely because commit 0618764cb25 did. If for some reason a stable@ kernel did pick up commit 0618764cb25 it should get reverted. Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Reviewed-by: Horia Geanta <horia.geanta@freescale.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"Ben Collins
[ Upstream commit c0403ec0bb5a8c5b267fb7e16021bec0b17e4964 ] This reverts Linux 4.1-rc1 commit 0618764cb25f6fa9fb31152995de42a8a0496475. The problem which that commit attempts to fix actually lies in the Freescale CAAM crypto driver not dm-crypt. dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto driver should internally backlog requests which arrive when the queue is full and process them later. Until the crypto hw's queue becomes full, the driver returns -EINPROGRESS. When the crypto hw's queue if full, the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is expected to backlog the request and process it when the hardware has queue space. At the point when the driver takes the request from the backlog and starts processing it, it calls the completion function with a status of -EINPROGRESS. The completion function is called (for a second time, in the case of backlogged requests) with a status/err of 0 when a request is done. Crypto drivers for hardware without hardware queueing use the helpers, crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request() and crypto_get_backlog() helpers to implement this behaviour correctly, while others implement this behaviour without these helpers (ccp, for example). dm-crypt (before the patch that needs reverting) uses this API correctly. It queues up as many requests as the hw queues will allow (i.e. as long as it gets back -EINPROGRESS from the request function). Then, when it sees at least one backlogged request (gets -EBUSY), it waits till that backlogged request is handled (completion gets called with -EINPROGRESS), and then continues. The references to af_alg_wait_for_completion() and af_alg_complete() in that commit's commit message are irrelevant because those functions only handle one request at a time, unlink dm-crypt. The problem is that the Freescale CAAM driver, which that commit describes as having being tested with, fails to implement the backlogging behaviour correctly. In cam_jr_enqueue(), if the hardware queue is full, it simply returns -EBUSY without backlogging the request. What the observed deadlock was is not described in the commit message but it is obviously the wait_for_completion() in crypto_convert() where dm-crypto would wait for the completion being called with -EINPROGRESS in the case of backlogged requests. This completion will never be completed due to the bug in the CAAM driver. Commit 0618764cb25 incorrectly made dm-crypt wait for every request, even when the driver/hardware queues are not full, which means that dm-crypt will never see -EBUSY. This means that that commit will cause a performance regression on all crypto drivers which implement the API correctly. Revert it. Correct backlog handling should be implemented in the CAAM driver instead. Cc'ing stable purely because commit 0618764cb25 did. If for some reason a stable@ kernel did pick up commit 0618764cb25 it should get reverted. Signed-off-by: Rabin Vincent <rabin.vincent@axis.com> Reviewed-by: Horia Geanta <horia.geanta@freescale.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17md/raid0: fix bug with chunksize not a power of 2.NeilBrown
[ Upstream commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd ] Since commit 20d0189b1012a37d2533a87fb451f7852f2418d1 in v3.14-rc1 RAID0 has performed incorrect calculations when the chunksize is not a power of 2. This happens because "sector_div()" modifies its first argument, but this wasn't taken into account in the patch. So restore that first arg before re-using the variable. Reported-by: Joe Landman <joe.landman@gmail.com> Reported-by: Dave Chinner <david@fromorbit.com> Fixes: 20d0189b1012a37d2533a87fb451f7852f2418d1 Cc: stable@vger.kernel.org (3.14 and later). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16dm snapshot: suspend merging snapshot when doing exception handoverMikulas Patocka
[ Upstream commit 09ee96b21456883e108c3b00597bb37ec512151b ] The "dm snapshot: suspend origin when doing exception handover" commit fixed a exception store handover bug associated with pending exceptions to the "snapshot-origin" target. However, a similar problem exists in snapshot merging. When snapshot merging is in progress, we use the target "snapshot-merge" instead of "snapshot-origin". Consequently, during exception store handover, we must find the snapshot-merge target and suspend its associated mapped_device. To avoid lockdep warnings, the target must be suspended and resumed without holding _origins_lock. Introduce a dm_hold() function that grabs a reference on a mapped_device, but unlike dm_get(), it doesn't crash if the device has the DMF_FREEING flag set, it returns an error in this case. In snapshot_resume() we grab the reference to the origin device using dm_hold() while holding _origins_lock (_origins_lock guarantees that the device won't disappear). Then we release _origins_lock, suspend the device and grab _origins_lock again. NOTE to stable@ people: When backporting to kernels 3.18 and older, use dm_internal_suspend and dm_internal_resume instead of dm_internal_suspend_fast and dm_internal_resume_fast. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16dm snapshot: suspend origin when doing exception handoverMikulas Patocka
[ Upstream commit b735fede8d957d9d255e9c5cf3964cfa59799637 ] In the function snapshot_resume we perform exception store handover. If there is another active snapshot target, the exception store is moved from this target to the target that is being resumed. The problem is that if there is some pending exception, it will point to an incorrect exception store after that handover, causing a crash due to dm-snap-persistent.c:get_exception()'s BUG_ON. This bug can be triggered by repeatedly changing snapshot permissions with "lvchange -p r" and "lvchange -p rw" while there are writes on the associated origin device. To fix this bug, we must suspend the origin device when doing the exception store handover to make sure that there are no pending exceptions: - introduce _origin_hash that keeps track of dm_origin structures. - introduce functions __lookup_dm_origin, __insert_dm_origin and __remove_dm_origin that manipulate the origin hash. - modify snapshot_resume so that it calls dm_internal_suspend_fast() and dm_internal_resume_fast() on the origin device. NOTE to stable@ people: When backporting to kernels 3.12-3.18, use dm_internal_suspend and dm_internal_resume instead of dm_internal_suspend_fast and dm_internal_resume_fast. When backporting to kernels older than 3.12, you need to pick functions dm_internal_suspend and dm_internal_resume from the commit fd2ed4d252701d3bbed4cd3e3d267ad469bb832a. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16dm thin: fix to consistently zero-fill reads to unprovisioned blocksSasha Levin
[ Upstream commit 5f027a3bf184d1d36e68745f7cd3718a8b879cc0 ] It was always intended that a read to an unprovisioned block will return zeroes regardless of whether the pool is in read-only or read-write mode. thin_bio_map() was inconsistent with its handling of such reads when the pool is in read-only mode, it now properly zero-fills the bios it returns in response to unprovisioned block reads. Eliminate thin_bio_map()'s special read-only mode handling of -ENODATA and just allow the IO to be deferred to the worker which will result in pool->process_bio() handling the IO (which already properly zero-fills reads to unprovisioned blocks). Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16dm io: deal with wandering queue limits when handling REQ_DISCARD and ↵Darrick J. Wong
REQ_WRITE_SAME [ Upstream commit e5db29806b99ce2b2640d2e4d4fcb983cea115c5 ] Since it's possible for the discard and write same queue limits to change while the upper level command is being sliced and diced, fix up both of them (a) to reject IO if the special command is unsupported at the start of the function and (b) read the limits once and let the commands error out on their own if the status happens to change. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16dm: hold suspend_lock while suspending device during device deletionMikulas Patocka
[ Upstream commit ab7c7bb6f4ab95dbca96fcfc4463cd69843e3e24 ] __dm_destroy() must take the suspend_lock so that its presuspend and postsuspend calls do not race with an internal suspend. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-23dm snapshot: fix a possible invalid memory access on unloadMikulas Patocka
commit 22aa66a3ee5b61e0f4a0bfeabcaa567861109ec3 upstream. When the snapshot target is unloaded, snapshot_dtr() waits until pending_exceptions_count drops to zero. Then, it destroys the snapshot. Therefore, the function that decrements pending_exceptions_count should not touch the snapshot structure after the decrement. pending_complete() calls free_pending_exception(), which decrements pending_exceptions_count, and then it performs up_write(&s->lock) and it calls retry_origin_bios() which dereferences s->origin. These two memory accesses to the fields of the snapshot may touch the dm_snapshot struture after it is freed. This patch moves the call to free_pending_exception() to the end of pending_complete(), so that the snapshot will not be destroyed while pending_complete() is in progress. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23dm: fix a race condition in dm_get_mdMikulas Patocka
commit 2bec1f4a8832e74ebbe859f176d8a9cb20dd97f4 upstream. The function dm_get_md finds a device mapper device with a given dev_t, increases the reference count and returns the pointer. dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the device, tests that the device doesn't have DMF_DELETING or DMF_FREEING flag, drops _minor_lock and returns pointer to the device. dm_get_md then calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag, otherwise it increments the reference count. There is a possible race condition - after dm_find_md exits and before dm_get is called, there are no locks held, so the device may disappear or DMF_FREEING flag may be set, which results in BUG. To fix this bug, we need to call dm_get while we hold _minor_lock. This patch renames dm_find_md to dm_get_md and changes it so that it calls dm_get while holding the lock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23dm io: reject unsupported DISCARD requests with EOPNOTSUPPDarrick J. Wong
commit 37527b869207ad4c208b1e13967d69b8bba1fbf9 upstream. I created a dm-raid1 device backed by a device that supports DISCARD and another device that does NOT support DISCARD with the following dm configuration: # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo # lsblk -D NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sda 0 4K 1G 0 `-moo (dm-0) 0 4K 1G 0 sdb 0 0B 0B 0 `-moo (dm-0) 0 4K 1G 0 Notice that the mirror device /dev/mapper/moo advertises DISCARD support even though one of the mirror halves doesn't. If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite loop in do_region() when it tries to issue a DISCARD request to sdb. The problem is that when we call do_region() against sdb, num_sectors is set to zero because q->limits.max_discard_sectors is zero. Therefore, "remaining" never decreases and the loop never terminates. To fix this: before entering the loop, check for the combination of REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up the mirror device. This bug was found by the unfortunate coincidence of pvmove and a discard operation in the RHEL 6.5 kernel; upstream is also affected. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23dm mirror: do not degrade the mirror on discard errorMikulas Patocka
commit f2ed51ac64611d717d1917820a01930174c2f236 upstream. It may be possible that a device claims discard support but it rejects discards with -EOPNOTSUPP. It happens when using loopback on ext2/ext3 filesystem driven by the ext4 driver. It may also happen if the underlying devices are moved from one disk on another. If discard error happens, we reject the bio with -EOPNOTSUPP, but we do not degrade the array. This patch fixes failed test shell/lvconvert-repair-transient.sh in the lvm2 testsuite if the testsuite is extracted on an ext2 or ext3 filesystem and it is being driven by the ext4 driver. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06md/raid1: fix read balance when a drive is write-mostly.Tomáš Hodek
commit d1901ef099c38afd11add4cfb3312c02ef21ec4a upstream. When a drive is marked write-mostly it should only be the target of reads if there is no other option. This behaviour was broken by commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc md/raid1: read balance chooses idlest disk for SSD which causes a write-mostly device to be *preferred* is some cases. Restore correct behaviour by checking and setting best_dist_disk and best_pending_disk rather than best_disk. We only need to test one of these as they are both changed from -1 or >=0 at the same time. As we leave min_pending and best_dist unchanged, any non-write-mostly device will appear better than the write-mostly device. Reported-by: Tomáš Hodek <tomas.hodek@volny.cz> Reported-by: Dark Penguin <darkpenguin@yandex.ru> Signed-off-by: NeilBrown <neilb@suse.de> Link: http://marc.info/?l=linux-raid&m=135982797322422 Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06md/raid5: Fix livelock when array is both resyncing and degraded.NeilBrown
commit 26ac107378c4742978216be1005b7291b799c7b2 upstream. Commit a7854487cd7128a30a7f4f5259de9f67d5efb95f: md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write. Causes an RCW cycle to be forced even when the array is degraded. A degraded array cannot support RCW as that requires reading all data blocks, and one may be missing. Forcing an RCW when it is not possible causes a live-lock and the code spins, repeatedly deciding to do something that cannot succeed. So change the condition to only force RCW on non-degraded arrays. Reported-by: Manibalan P <pmanibalan@amiindia.co.in> Bisected-by: Jes Sorensen <Jes.Sorensen@redhat.com> Tested-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11md/raid5: fix another livelock caused by non-aligned writes.NeilBrown
commit b1b02fe97f75b12ab34b2303bfd4e3526d903a58 upstream. If a non-page-aligned write is destined for a device which is missing/faulty, we can deadlock. As the target device is missing, a read-modify-write cycle is not possible. As the write is not for a full-page, a recontruct-write cycle is not possible. This should be handled by logic in fetch_block() which notices there is a non-R5_OVERWRITE write to a missing device, and so loads all blocks. However since commit 67f455486d2ea2, that code requires STRIPE_PREREAD_ACTIVE before it will active, and those circumstances never set STRIPE_PREREAD_ACTIVE. So: in handle_stripe_dirtying, if neither rmw or rcw was possible, set STRIPE_DELAYED, which will cause STRIPE_PREREAD_ACTIVE be set after a suitable delay. Fixes: 67f455486d2ea20b2d94d6adf5b9b783d079e321 Reported-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05dm cache: fix missing ERR_PTR returns and handlingJoe Thornber
commit 766a78882ddf79b162243649d7dfdbac1fb6fb88 upstream. Commit 9b1cc9f251 ("dm cache: share cache-metadata object across inactive and active DM tables") mistakenly ignored the use of ERR_PTR returns. Restore missing IS_ERR checks and ERR_PTR returns where appropriate. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05dm thin: don't allow messages to be sent to a pool target in READ_ONLY or ↵Joe Thornber
FAIL mode commit 2a7eaea02b99b6e267b1e89c79acc6e9a51cee3b upstream. You can't modify the metadata in these modes. It's better to fail these messages immediately than let the block-manager deny write locks on metadata blocks. Otherwise these failed metadata changes will trigger 'needs_check' to get set in the metadata superblock -- requiring repair using the thin_check utility. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29dm cache: fix problematic dual use of a single migration count variableJoe Thornber
commit a59db67656021fa212e9b95a583f13c34eb67cd9 upstream. Introduce a new variable to count the number of allocated migration structures. The existing variable cache->nr_migrations became overloaded. It was used to: i) track of the number of migrations in flight for the purposes of quiescing during suspend. ii) to estimate the amount of background IO occuring. Recent discard changes meant that REQ_DISCARD bios are processed with a migration. Discards are not background IO so nr_migrations was not incremented. However this could cause quiescing to complete early. (i) is now handled with a new variable cache->nr_allocated_migrations. cache->nr_migrations has been renamed cache->nr_io_migrations. cleanup_migration() is now called free_io_migration(), since it decrements that variable. Also, remove the unused cache->next_migration variable that got replaced with with prealloc_structs a while ago. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29dm cache: share cache-metadata object across inactive and active DM tablesJoe Thornber
commit 9b1cc9f251affdd27f29fe46d0989ba76c33faf6 upstream. If a DM table is reloaded with an inactive table when the device is not suspended (normal procedure for LVM2), then there will be two dm-bufio objects that can diverge. This can lead to a situation where the inactive table uses bufio to read metadata at the same time the active table writes metadata -- resulting in the inactive table having stale metadata buffers once it is promoted to the active table slot. Fix this by using reference counting and a global list of cache metadata objects to ensure there is only one metadata object per metadata device. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-27dm: fix missed error code if .end_io isn't implemented by target_typezhendong chen
commit 5164bece1673cdf04782f8ed3fba70743700f5da upstream. In bio-based DM's clone_endio(), when target_type doesn't implement .end_io (e.g. linear) r will be always be initialized 0. So if a WRITE SAME bio fails WRITE SAME will not be disabled as intended. Fix this by initializing r to error, rather than 0, in clone_endio(). Signed-off-by: Alex Chen <alex.chen@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: 7eee4ae2db ("dm: disable WRITE SAME if it fails") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.NeilBrown
commit 108cef3aa41669610e1836fe638812dd067d72de upstream. It is critical that fetch_block() and handle_stripe_dirtying() are consistent in their analysis of what needs to be loaded. Otherwise raid5 can wait forever for a block that won't be loaded. Currently when writing to a RAID5 that is resyncing, to a location beyond the resync offset, handle_stripe_dirtying chooses a reconstruct-write cycle, but fetch_block() assumes a read-modify-write, and a lockup can happen. So treat that case just like RAID6, just as we do in handle_stripe_dirtying. RAID6 always does reconstruct-write. This bug was introduced when the behaviour of handle_stripe_dirtying was changed in 3.7, so the patch is suitable for any kernel since, though it will need careful merging for some versions. Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f Reported-by: Henry Cai <henryplusplus@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm thin: fix a race in thin_dtrMikulas Patocka
commit 17181fb7a0c3a279196c0eeb2caba65a1519614b upstream. As long as struct thin_c is in the list, anyone can grab a reference of it. Consequently, we must wait for the reference count to drop to zero *after* we remove the structure from the list, not before. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm thin: fix missing out-of-data-space to write mode transition if blocks ↵Joe Thornber
are released commit 2c43fd26e46734430122b8d2ad3024bb532df3ef upstream. Discard bios and thin device deletion have the potential to release data blocks. If the thin-pool is in out-of-data-space mode, and blocks were released, transition the thin-pool back to full write mode. The correct time to do this is just after the thin-pool metadata commit. It cannot be done before the commit because the space maps will not allow immediate reuse of the data blocks in case there's a rollback following power failure. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm thin: fix inability to discard blocks when in out-of-data-space modeJoe Thornber
commit 45ec9bd0fd7abf8705e7cf12205ff69fe9d51181 upstream. When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard function pointer was incorrectly being set to process_prepared_discard_passdown rather than process_prepared_discard. This incorrect function pointer meant the discard was being passed down, but not effecting the mapping. As such any discard that was issued, in an attempt to reclaim blocks, would not successfully free data space. Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm space map metadata: fix sm_bootstrap_get_nr_blocks()Dan Carpenter
commit c1c6156fe4d4577444b769d7edd5dd503e57bbc9 upstream. This function isn't right and it causes a static checker warning: drivers/md/dm-thin.c:3016 maybe_resize_data_dev() error: potentially using uninitialized 'sb_data_size'. It should set "*count" and return zero on success the same as the sm_metadata_get_nr_blocks() function does earlier. Fixes: 3241b1d3e0aa ('dm: add persistent data library') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm cache: fix spurious cell_defer when dealing with partial block at end of ↵Joe Thornber
device commit f824a2af3dfbbb766c02e19df21f985bceadf0ee upstream. We never bother caching a partial block that is at the back end of the origin device. No cell ever gets locked, but the calling code was assuming it was and trying to release it. Now the code only releases if the cell has been set to a non NULL value. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm cache: dirty flag was mistakenly being cleared when promoting via overwriteJoe Thornber
commit 1e32134a5a404e80bfb47fad8a94e9bbfcbdacc5 upstream. If the incoming bio is a WRITE and completely covers a block then we don't bother to do any copying for a promotion operation. Once this is done the cache block and origin block will be different, so we need to set it to 'dirty'. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm cache: only use overwrite optimisation for promotion when in writeback modeJoe Thornber
commit f29a3147e251d7ae20d3194ff67f109d71e501b4 upstream. Overwrite causes the cache block and origin blocks to diverge, which is only allowed in writeback mode. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm crypt: use memzero_explicit for on-stack bufferMilan Broz
commit 1a71d6ffe18c0d0f03fc8531949cc8ed41d702ee upstream. Use memzero_explicit to cleanup sensitive data allocated on stack to prevent the compiler from optimizing and removing memset() calls. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08dm bufio: fix memleak when using a dm_buffer's inline bioDarrick J. Wong
commit 445559cdcb98a141f5de415b94fd6eaccab87e6d upstream. When dm-bufio sets out to use the bio built into a struct dm_buffer to issue an IO, it needs to call bio_reset after it's done with the bio so that we can free things attached to the bio such as the integrity payload. Therefore, inject our own endio callback to take care of the bio_reset after calling submit_io's end_io callback. Test case: 1. modprobe scsi_debug delay=0 dif=1 dix=199 ato=1 dev_size_mb=300 2. Set up a dm-bufio client, e.g. dm-verity, on the scsi_debug device 3. Repeatedly read metadata and watch kmalloc-192 leak! Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-16Merge tag 'md/3.18-fix' of git://neil.brown.name/mdLinus Torvalds
Pull md bugfix from Neil Brown: "One fix for md for 3.18. This fixes a regression introduced in 3.13" * tag 'md/3.18-fix' of git://neil.brown.name/md: md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN