aboutsummaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2019-03-29drivers/block/zram/zram_drv.c: fix idle/writeback string compareMinchan Kim
Makoto report a below KASAN error: zram does out-of-bounds read. Because strscpy copies from source up to count bytes unconditionally. It could cause out-of-bounds read on next object in slab. To prevent it, use strlcpy which checks source's length automatically. BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154 Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314 .. Call trace: strscpy+0x68/0x154 idle_store+0xc4/0x34c dev_attr_store+0x50/0x6c sysfs_kf_write+0x98/0xb4 kernfs_fop_write+0x198/0x260 __vfs_write+0x10c/0x338 vfs_write+0x114/0x238 SyS_write+0xc8/0x168 __sys_trace_return+0x0/0x4 Allocated by task 1314: __kmalloc+0x280/0x318 kernfs_fop_write+0xac/0x260 __vfs_write+0x10c/0x338 vfs_write+0x114/0x238 SyS_write+0xc8/0x168 __sys_trace_return+0x0/0x4 Freed by task 2855: kfree+0x138/0x630 kernfs_put_open_node+0x10c/0x124 kernfs_fop_release+0xd8/0x114 __fput+0x130/0x2a4 ____fput+0x1c/0x28 task_work_run+0x16c/0x1c8 do_notify_resume+0x2bc/0x107c work_pending+0x8/0x10 The buggy address belongs to the object at ffffffc0c3495a00 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes inside of 128-byte region [ffffffc0c3495a00, ffffffc0c3495a80) The buggy address belongs to the page: page:ffffffbf030d2500 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x4000000000010200(slab|head) page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Link: http://lkml.kernel.org/r/20190319231911.145968-1-minchan@kernel.org Cc: <stable@vger.kernel.org> [5.0] Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Makoto Wu <makotowu@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-23Merge tag 'for-linus-20190323' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A set of fixes/changes that should go into this series. This contains: - Kernel doc / comment updates (Bart, Shenghui) - Un-export of core-only used function (Bart) - Fix race on loop file access (Dongli) - pf/pcd queue cleanup fixes (me) - Use appropriate helper for RESTART bit set (Yufen) - Use named identifier for classic poll (Yufen)" * tag 'for-linus-20190323' of git://git.kernel.dk/linux-block: sbitmap: trivial - update comment for sbitmap_deferred_clear_bit blkcg: Fix kernel-doc warnings blk-iolatency: #include "blk.h" block: Unexport blk_mq_add_to_requeue_list() block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx loop: access lo_backing_file only when the loop device is Lo_bound blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART paride/pcd: cleanup queues when detection fails paride/pf: cleanup queues when detection fails
2019-03-20rbd: drop wait_for_latest_osdmap()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-18rbd: set io_min, io_opt and discard_granularity to alloc_sizeIlya Dryomov
Now that we have alloc_size that controls our discard behavior, it doesn't make sense to have these set to object (set) size. alloc_size defaults to 64k, but because discard_granularity is likely 4M, only ranges that are equal to or bigger than 4M can be considered during fstrim. A smaller io_min is also more likely to be met, resulting in fewer deferred writes on bluestore OSDs. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-18loop: access lo_backing_file only when the loop device is Lo_boundDongli Zhang
Commit 758a58d0bc67 ("loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()") separates "lo->lo_backing_file = NULL" and "lo->lo_state = Lo_unbound" into different critical regions protected by loop_ctl_mutex. However, there is below race that the NULL lo->lo_backing_file would be accessed when the backend of a loop is another loop device, e.g., loop0's backend is a file, while loop1's backend is loop0. loop0's backend is file loop1's backend is loop0 __loop_clr_fd() mutex_lock(&loop_ctl_mutex); lo->lo_backing_file = NULL; --> set to NULL mutex_unlock(&loop_ctl_mutex); loop_set_fd() mutex_lock_killable(&loop_ctl_mutex); loop_validate_file() f = l->lo_backing_file; --> NULL access if loop0 is not Lo_unbound mutex_lock(&loop_ctl_mutex); lo->lo_state = Lo_unbound; mutex_unlock(&loop_ctl_mutex); lo->lo_backing_file should be accessed only when the loop device is Lo_bound. In fact, the problem has been introduced already in commit 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()") after which loop_validate_file() could see devices in Lo_rundown state with which it did not count. It was harmless at that point but still. Fixes: 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()") Reported-by: syzbot+9bdc1adc1c55e7fe765b@syzkaller.appspotmail.com Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-18paride/pcd: cleanup queues when detection failsJens Axboe
The driver allocates queues for all the units it potentially supports. But if we fail to detect any drives, then we fail loading the module without cleaning up those queues. This is now evident with the switch to blk-mq, though the bug has been there forever as far as I can tell. Also fix cleanup through regular module exit. Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-18paride/pf: cleanup queues when detection failsJens Axboe
The driver allocates queues for all the units it potentially supports. But if we fail to detect any drives, then we fail loading the module without cleaning up those queues. This is now evident with the switch to blk-mq, though the bug has been there forever as far as I can tell. Also fix cleanup through regular module exit. Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-16Merge tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block layer changes from Jens Axboe: "This is a collection of both stragglers, and fixes that came in after I finalized the initial pull. This contains: - An MD pull request from Song, with a few minor fixes - Set of NVMe patches via Christoph - Pull request from Konrad, with a few fixes for xen/blkback - pblk fix IO calculation fix (Javier) - Segment calculation fix for pass-through (Ming) - Fallthrough annotation for blkcg (Mathieu)" * tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block: (25 commits) blkcg: annotate implicit fall through nvme-tcp: support C2HData with SUCCESS flag nvmet: ignore EOPNOTSUPP for discard nvme: add proper write zeroes setup for the multipath device nvme: add proper discard setup for the multipath device nvme: remove nvme_ns_config_oncs nvme: disable Write Zeroes for qemu controllers nvmet-fc: bring Disconnect into compliance with FC-NVME spec nvmet-fc: fix issues with targetport assoc_list list walking nvme-fc: reject reconnect if io queue count is reduced to zero nvme-fc: fix numa_node when dev is null nvme-fc: use nr_phys_segments to determine existence of sgl nvme-loop: init nvmet_ctrl fatal_err_work when allocate nvme: update comment to make the code easier to read nvme: put ns_head ref if namespace fails allocation nvme-trace: fix cdw10 buffer overrun nvme: don't warn on block content change effects nvme: add get-feature to admin cmds tracer md: Fix failed allocation of md_register_thread It's wrong to add len to sector_nr in raid10 reshape twice ...
2019-03-14zram: default to lzo-rle instead of lzoDave Rodgman
lzo-rle gives higher performance and similar compression ratios to lzo. Link: http://lkml.kernel.org/r/20190205155944.16007-4-dave.rodgman@arm.com Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12Merge tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The highlights are: - rbd will now ignore discards that aren't aligned and big enough to actually free up some space (myself). This is controlled by the new alloc_size map option and can be disabled if needed. - support for rbd deep-flatten feature (myself). Deep-flatten allows "rbd flatten" to fully disconnect the clone image and its snapshots from the parent and make the parent snapshot removable. - a new round of cap handling improvements (Zheng Yan). The kernel client should now be much more prompt about releasing its caps and it is possible to put a limit on the number of caps held. - support for getting ceph.dir.pin extended attribute (Zheng Yan)" * tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-client: (26 commits) Documentation: modern versions of ceph are not backed by btrfs rbd: advertise support for RBD_FEATURE_DEEP_FLATTEN rbd: whole-object write and zeroout should copyup when snapshots exist rbd: copyup with an empty snapshot context (aka deep-copyup) rbd: introduce rbd_obj_issue_copyup_ops() rbd: stop copying num_osd_ops in rbd_obj_issue_copyup() rbd: factor out __rbd_osd_req_create() rbd: clear ->xferred on error from rbd_obj_issue_copyup() rbd: remove experimental designation from kernel layering ceph: add mount option to limit caps count ceph: periodically trim stale dentries ceph: delete stale dentry when last reference is dropped ceph: remove dentry_lru file from debugfs ceph: touch existing cap when handling reply ceph: pass inclusive lend parameter to filemap_write_and_wait_range() rbd: round off and ignore discards that are too small rbd: handle DISCARD and WRITE_ZEROES separately rbd: get rid of obj_req->obj_request_count libceph: use struct_size() for kmalloc() in crush_decode() ceph: send cap releases more aggressively ...
2019-03-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several fixes, most notably fix for virtio on swiotlb systems" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: silence an unused-variable warning virtio: hint if callbacks surprisingly might sleep virtio-ccw: wire up ->bus_name callback s390/virtio: handle find on invalid queue gracefully virtio-ccw: diag 500 may return a negative cookie virtio_balloon: remove the unnecessary 0-initialization virtio-balloon: improve update_balloon_size_func virtio-blk: Consider virtio_max_dma_size() for maximum segment size virtio: Introduce virtio_max_dma_size() dma: Introduce dma_max_mapping_size() swiotlb: Add is_swiotlb_active() function swiotlb: Introduce swiotlb_max_mapping_size()
2019-03-08Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "Not a huge amount of changes in this round, the biggest one is that we finally have Mings multi-page bvec support merged. Apart from that, this pull request contains: - Small series that avoids quiescing the queue for sysfs changes that match what we currently have (Aleksei) - Series of bcache fixes (via Coly) - Series of lightnvm fixes (via Mathias) - NVMe pull request from Christoph. Nothing major, just SPDX/license cleanups, RR mp policy (Hannes), and little fixes (Bart, Chaitanya). - BFQ series (Paolo) - Save blk-mq cpu -> hw queue mapping, removing a pointer indirection for the fast path (Jianchao) - fops->iopoll() added for async IO polling, this is a feature that the upcoming io_uring interface will use (Christoph, me) - Partition scan loop fixes (Dongli) - mtip32xx conversion from managed resource API (Christoph) - cdrom registration race fix (Guenter) - MD pull from Song, two minor fixes. - Various documentation fixes (Marcos) - Multi-page bvec feature. This brings a lot of nice improvements with it, like more efficient splitting, larger IOs can be supported without growing the bvec table size, and so on. (Ming) - Various little fixes to core and drivers" * tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits) block: fix updating bio's front segment size block: Replace function name in string with __func__ nbd: propagate genlmsg_reply return code floppy: remove set but not used variable 'q' null_blk: fix checking for REQ_FUA block: fix NULL pointer dereference in register_disk fs: fix guard_bio_eod to check for real EOD errors blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map block: optimize bvec iteration in bvec_iter_advance block: introduce mp_bvec_for_each_page() for iterating over page block: optimize blk_bio_segment_split for single-page bvec block: optimize __blk_segment_map_sg() for single-page bvec block: introduce bvec_nth_page() iomap: wire up the iopoll method block: add bio_set_polled() helper block: wire up block device iopoll method fs: add an iopoll method to struct file_operations loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() loop: do not print warn message if partition scan is successful block: bounce: make sure that bvec table is updated ...
2019-03-07lib/lzo: separate lzo-rle from lzoDave Rodgman
To prevent any issues with persistent data, separate lzo-rle from lzo so that it is treated as a separate algorithm, and lzo is still available. Link: http://lkml.kernel.org/r/20190205155944.16007-3-dave.rodgman@arm.com Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com> Cc: Matt Sealey <matt.sealey@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <nitingupta910@gmail.com> Cc: Richard Purdie <rpurdie@openedhand.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Sonny Rao <sonnyrao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-06Merge tag 'driver-core-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...
2019-03-06Merge branch 'stable/for-jens-5.1' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-5.1/block-post Pull two xen blkback fixes from Konrad. * 'stable/for-jens-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront xen/blkback: add stack variable 'blkif' in connect_ring()
2019-03-06virtio-blk: Consider virtio_max_dma_size() for maximum segment sizeJoerg Roedel
Segments can't be larger than the maximum DMA mapping size supported on the platform. Take that into account when setting the maximum segment size for a block device. Cc: stable@vger.kernel.org Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05rbd: advertise support for RBD_FEATURE_DEEP_FLATTENIlya Dryomov
All copyups perform deep-copyup regardless of whether deep-flatten feature is enabled. The feature bit is used to ensure that image is written to only by new-enough clients that always perform deep-copyup. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: whole-object write and zeroout should copyup when snapshots existIlya Dryomov
Otherwise, once the parent snapshot is removed, the clone's snapshot wouldn't reflect the state of the clone prior to whole-object write or zeroout because a deep-copyup was never done ("rbd flatten" wouldn't do it because the modified object would exist in HEAD). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: copyup with an empty snapshot context (aka deep-copyup)Ilya Dryomov
This is the core of deep-flatten feature: sending a copyup request (i.e. a guarded write of the data read from the parent) with an empty snapshot context (snaps = [], seq = 0) causes the OSD to reflect the write in all existing snapshots. This allows "rbd flatten" to fully disconnect the clone image and its snapshots from the parent and make the parent snapshot removable. The actual modification request is sent only after deep-copyup request is completed. Waiting for deep-copyup reply is unnecessary, this will be improved in the future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: introduce rbd_obj_issue_copyup_ops()Ilya Dryomov
In preparation for deep-flatten feature, split rbd_obj_issue_copyup() into two functions and add a new write state to make the state machine slightly more clear. Make the copyup op optional and start using that for when the overlap goes to 0. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: stop copying num_osd_ops in rbd_obj_issue_copyup()Ilya Dryomov
In preparation for deep-flatten feature, stop copying num_osd_ops from the original request in rbd_obj_issue_copyup(). Split the calculation into count_{write,zeroout}_ops() respectively and determine whether the assert_exists guard is needed with the new rbd_obj_copyup_enabled(). As a nice side effect, we no longer guard in the writefull case as the copyup'ed object is always fully overwritten. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: factor out __rbd_osd_req_create()Ilya Dryomov
Allow passing a custom snapshot context: NULL for read and an empty snapshot context for deep-copyup. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: clear ->xferred on error from rbd_obj_issue_copyup()Ilya Dryomov
Otherwise the assert in rbd_obj_end_request() is triggered. Fixes: 3da691bf4366 ("rbd: new request handling code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-05rbd: remove experimental designation from kernel layeringIlya Dryomov
Support for kernel layering hasn't been considered experimental for a few years now. All the issues that I'm aware of were shaken out in 2014 and early 2015. Moreover, most of that code was rewritten with the addition of support for fancy striping. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-05rbd: round off and ignore discards that are too smallIlya Dryomov
If, after rounding off, the discard request is smaller than alloc_size, drop it on the floor in __rbd_img_fill_request(). Default alloc_size to 64k. This should cover both HDD and SSD based bluestore OSDs and somewhat improve things for filestore. For OSDs on filestore with filestore_punch_hole = false, alloc_size is best set to object size in order to allow deletes and truncates and disallow zero op. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-05rbd: handle DISCARD and WRITE_ZEROES separatelyIlya Dryomov
With discard_zeroes_data gone in commit 48920ff2a5a9 ("block: remove the discard_zeroes_data flag"), continuing to provide this guarantee is pointless: applications can't query it and discards can only be used for deallocating. Add OBJ_OP_ZEROOUT and move the existing logic under it. As the first step to divorcing OBJ_OP_DISCARD, stop worrying about copyups but keep special casing whole-object layered discards. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-03-05rbd: get rid of obj_req->obj_request_countIlya Dryomov
It is effectively unused. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-02-28nbd: propagate genlmsg_reply return codeLi RongQing
genlmsg_reply can fail, so propagate its return code Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28floppy: remove set but not used variable 'q'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/block/floppy.c: In function 'request_done': drivers/block/floppy.c:2233:24: warning: variable 'q' set but not used [-Wunused-but-set-variable] It's never used and can be removed. Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28null_blk: fix checking for REQ_FUAHeinz Mauelshagen
null_handle_bio() erroneously uses the bio_op macro which masks respective request flag bits including REQ_FUA out thus failing the check. Fix by checking bio->bi_opf directly. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-24xen/blkback: rework connect_ring() to avoid inconsistent xenstore ↵Dongli Zhang
'ring-page-order' set by malicious blkfront The xenstore 'ring-page-order' is used globally for each blkback queue and therefore should be read from xenstore only once. However, it is obtained in read_per_ring_refs() which might be called multiple times during the initialization of each blkback queue. If the blkfront is malicious and the 'ring-page-order' is set in different value by blkfront every time before blkback reads it, this may end up at the "WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));" in xen_blkif_disconnect() when frontend is destroyed. This patch reworks connect_ring() to read xenstore 'ring-page-order' only once. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2019-02-22loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()Dongli Zhang
Commit 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") moves blkdev_reread_part() out of the loop_ctl_mutex. However, GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result, __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and will not rescan the loop device to delete all partitions. Below are steps to reproduce the issue: step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100 step2 # losetup -P /dev/loop0 tmp.raw step3 # parted /dev/loop0 mklabel gpt step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1 step5 # losetup -d /dev/loop0 Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and there is below kernel warning message: [ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22) This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part(). Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-22loop: do not print warn message if partition scan is successfulDongli Zhang
Do not print warn message when the partition scan returns 0. Fixes: d57f3374ba48 ("loop: Move special partition reread handling in loop_clr_fd()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15block: kill BLK_MQ_F_SG_MERGEMing Lei
QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15block: loop: pass multi-page bvec to iov_iterMing Lei
iov_iter is implemented on bvec itererator helpers, so it is safe to pass multi-page bvec to it, and this way is much more efficient than passing one page in each bvec. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-12floppy: check_events callback should not return a negative numberYufen Yu
floppy_check_events() is supposed to return bit flags to say which events occured. We should return zero to say that no event flags are set. Only BIT(0) and BIT(1) are used in the caller. And .check_events interface also expect to return an unsigned int value. However, after commit a0c80efe5956, it may return -EINTR (-4u). Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't affect runtime, but it obviously is still worth fixing. Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: a0c80efe5956 ("floppy: fix lock_fdc() signal handling") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-11Merge 5.0-rc6 into driver-core-nextGreg Kroah-Hartman
We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-09block: kill QUEUE_FLAG_FLUSH_NQJens Axboe
We have various helpers for setting/clearing this flag, and also a helper to check if the queue supports queueable flushes or not. But nobody uses them anymore, kill it with fire. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-31mtip32xx: ѕtop abusing the managed resource APIsChristoph Hellwig
The mtip32xx driver uses managed resources for DMA coherent memory and irqs, but then always pairs them with free calls anyway, making the resource tracking rather pointless. Given some DMA allocations are transient anyway, the irq freeing seems to require ordering vs other hardware access the best solution seems to be to stop using the managed resource API entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-22block: rbd: convert to use BUS_ATTR_WO and ROGreg Kroah-Hartman
We are trying to get rid of BUS_ATTR() and the usage of that in rbd.c can be trivially converted to use BUS_ATTR_WO and RO, so use those macros instead. Cc: Sage Weil <sage@redhat.com> Cc: Alex Elder <elder@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-20Merge tag 'for-linus-20190118' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - block size setting fixes for loop/nbd (Jan Kara) - md bio_alloc_mddev() cleanup (Marcos) - Ensure we don't lose the REQ_INTEGRITY flag (Ming) - Two NVMe fixes by way of Christoph: - Fix NVMe IRQ calculation (Ming) - Uninitialized variable in nvmet-tcp (Sagi) - BFQ comment fix (Paolo) - License cleanup for recently added blk-mq-debugfs-zoned (Thomas) * tag 'for-linus-20190118' of git://git.kernel.dk/linux-block: block: Cleanup license notice nvme-pci: fix nvme_setup_irqs() nvmet-tcp: fix uninitialized variable access block: don't lose track of REQ_INTEGRITY flag blockdev: Fix livelocks on loop device nbd: Use set_blocksize() to set device blocksize md: Make bio_alloc_mddev use bio_alloc_bioset block, bfq: fix comments on __bfq_deactivate_entity
2019-01-17xen/blkback: add stack variable 'blkif' in connect_ring()Dongli Zhang
As 'be->blkif' is used for many times in connect_ring(), the stack variable 'blkif' is added to substitute 'be-blkif'. Suggested-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2019-01-15nbd: Use set_blocksize() to set device blocksizeJan Kara
NBD can update block device block size implicitely through bd_set_size(). Make it explicitely set blocksize with set_blocksize() as this behavior of bd_set_size() is going away. CC: Josef Bacik <jbacik@fb.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-12Merge tag 'for-linus-20190112' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph, with little fixes all over the map - Loop caching fix for offset/bs change (Jaegeuk Kim) - Block documentation tweaks (Jeff, Jon, Weiping, John) - null_blk zoned tweak (John) - ahch mvebu suspend/resume support. Should have gone into the merge window, but there was some confusion on which tree had it. (Miquel) * tag 'for-linus-20190112' of git://git.kernel.dk/linux-block: (22 commits) ata: ahci: mvebu: request PHY suspend/resume for Armada 3700 ata: ahci: mvebu: add Armada 3700 initialization needed for S2RAM ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs ata: ahci: mvebu: remove stale comment ata: libahci_platform: comply to PHY framework loop: drop caches if offset or block_size are changed block: fix kerneldoc comment for blk_attempt_plug_merge() nvme: don't initlialize ctrl->cntlid twice nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN nvme: pad fake subsys NQN vid and ssvid with zeros nvme-multipath: zero out ANA log buffer nvme-fabrics: unset write/poll queues for discovery controllers nvme-tcp: don't ask if controller is fabrics nvme-tcp: remove dead code nvme-pci: fix out of bounds access in nvme_cqe_pending nvme-pci: rerun irq setup on IO queue init errors nvme-pci: use the same attributes when freeing host_mem_desc_bufs. nvme-pci: fix the wrong setting of nr_maps block: doc: add slice_idle_us to bfq documentation block: clarify documentation for blk_{start|finish}_plug ...
2019-01-12Merge tag 'remove-dma_zalloc_coherent-5.0' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma_zalloc_coherent() removal from Christoph Hellwig: "We've always had a weird situation around dma_zalloc_coherent. To safely support mapping the allocations to userspace major architectures like x86 and arm have always zeroed allocations from dma_alloc_coherent, but a couple other architectures were missing that zeroing either always or in corner cases. Then later we grew anothe dma_zalloc_coherent interface to explicitly request zeroing, but that just added __GFP_ZERO to the allocation flags, which for some allocators that didn't end up using the page allocator ended up being a no-op and still not zeroing the allocations. So for this merge window I fixed up all remaining architectures to zero the memory in dma_alloc_coherent, and made dma_zalloc_coherent a no-op wrapper around dma_alloc_coherent, which fixes all of the above issues. dma_zalloc_coherent is now pointless and can go away, and Luis helped me writing a cocchinelle script and patch series to kill it, which I think we should apply now just after -rc1 to finally settle these issue" * tag 'remove-dma_zalloc_coherent-5.0' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: remove dma_zalloc_coherent() cross-tree: phase out dma_zalloc_coherent() on headers cross-tree: phase out dma_zalloc_coherent()
2019-01-11Merge tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "A patch to allow setting abort_on_full and a fix for an old "rbd unmap" edge case, marked for stable" * tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client: rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set ceph: use vmf_error() in ceph_filemap_fault() libceph: allow setting abort_on_full for rbd
2019-01-10rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is setIlya Dryomov
There is a window between when RBD_DEV_FLAG_REMOVING is set and when the device is removed from rbd_dev_list. During this window, we set "already" and return 0. Returning 0 from write(2) can confuse userspace tools because 0 indicates that nothing was written. In particular, "rbd unmap" will retry the write multiple times a second: 10:28:05.463299 write(4, "0", 1) = 0 10:28:05.463509 write(4, "0", 1) = 0 10:28:05.463720 write(4, "0", 1) = 0 10:28:05.463942 write(4, "0", 1) = 0 10:28:05.464155 write(4, "0", 1) = 0 Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-01-09loop: drop caches if offset or block_size are changedJaegeuk Kim
If we don't drop caches used in old offset or block_size, we can get old data from new offset/block_size, which gives unexpected data to user. For example, Martijn found a loopback bug in the below scenario. 1) LOOP_SET_FD loads first two pages on loop file 2) LOOP_SET_STATUS64 changes the offset on the loop file 3) mount is failed due to the cached pages having wrong superblock Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Reported-by: Martijn Coenen <maco@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-08zram: idle writeback fixes and cleanupMinchan Kim
This patch includes some fixes and cleanup for idle-page writeback. 1. writeback_limit interface Now writeback_limit interface is rather conusing. For example, once writeback limit budget is exausted, admin can see 0 from /sys/block/zramX/writeback_limit which is same semantic with disable writeback_limit at this moment. IOW, admin cannot tell that zero came from disable writeback limit or exausted writeback limit. To make the interface clear, let's sepatate enable of writeback limit to another knob - /sys/block/zram0/writeback_limit_enable * before: while true : # to re-enable writeback limit once previous one is used up echo 0 > /sys/block/zram0/writeback_limit echo $((200<<20)) > /sys/block/zram0/writeback_limit .. .. # used up the writeback limit budget * new # To enable writeback limit, from the beginning, admin should # enable it. echo $((200<<20)) > /sys/block/zram0/writeback_limit echo 1 > /sys/block/zram/0/writeback_limit_enable while true : echo $((200<<20)) > /sys/block/zram0/writeback_limit .. .. # used up the writeback limit budget It's much strightforward. 2. fix condition check idle/huge writeback mode check The mode in writeback_store is not bit opeartion any more so no need to use bit operations. Furthermore, current condition check is broken in that it does writeback every pages regardless of huge/idle. 3. clean up idle_store No need to use goto. [minchan@kernel.org: missed spin_lock_init] Link: http://lkml.kernel.org/r/20190103001601.GA255139@google.com Link: http://lkml.kernel.org/r/20181224033529.19450-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Suggested-by: John Dias <joaodias@google.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: John Dias <joaodias@google.com> Cc: Srinivas Paladugu <srnvs@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>