aboutsummaryrefslogtreecommitdiff
path: root/drivers/md/dm.c
AgeCommit message (Collapse)Author
2022-02-21dm: remove legacy code only needed before submit_bio recursionMike Snitzer
Commit 8615cb65bd63 ("dm: remove useless loop in __split_and_process_bio") showcased that we no longer loop. Remove the bio_advance() in __split_and_process_bio() that was only needed when looping was possible. Similarly there is no need to advance the bio, using ci->sector cursor, in __send_duplicate_bios(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: remove unused mapped_device argument from free_tioMike Snitzer
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: remove impossible BUG_ON in __send_empty_flushMike Snitzer
The flush_bio in question was just initialized to be empty, so there is no way bio_has_data() will return true. So remove stale BUG_ON(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: reduce code duplication in __map_bioMike Snitzer
Error path code (for handling DM_MAPIO_REQUEUE and DM_MAPIO_KILL) is effectively identical. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: refactor dm_split_and_process_bio a bitMike Snitzer
Remove needless branching and indentation. Leaves code to catch malformed op_is_zone_mgmt bios (they shouldn't have a payload). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: fold __clone_and_map_data_bio into __split_and_process_bioMike Snitzer
Fold __clone_and_map_data_bio into its only caller. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: rename split functionsMike Snitzer
Rename __split_and_process_bio to dm_split_and_process_bio. Rename __split_and_process_non_flush to __split_and_process_bio. Also fix a stale comment and whitespace. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: eliminate copying of dm_io fields in dm_io_dec_pendingMike Snitzer
There is no need for dm_io_dec_pending() to copy dm_io fields anymore now that DM provides its own pending_io counters again. The race documented in commit d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO") no longer exists now that block core's in_flight counters aren't used to signal all dm_io is complete. Also, rename {start,end}_io_acct to dm_{start,end}_io_acct. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm stats: fix too short end duration_ns when using precise_timestampsMike Snitzer
dm_stats_account_io()'s STAT_PRECISE_TIMESTAMPS support doesn't handle the fact that with commit b879f915bc48 ("dm: properly fix redundant bio-based IO accounting") io->start_time _may_ be in the past (meaning the start_io_acct() was deferred until later). Add a new dm_stats_recalc_precise_timestamps() helper that will set/clear a new 'precise_timestamps' flag in the dm_stats struct based on whether any configured stats enable STAT_PRECISE_TIMESTAMPS. And update DM core's alloc_io() to use dm_stats_record_start() to set stats_aux.duration_ns if stats->precise_timestamps is true. Also, remove unused 'last_sector' and 'last_rw' members from the dm_stats struct. Fixes: b879f915bc48 ("dm: properly fix redundant bio-based IO accounting") Cc: stable@vger.kernel.org Co-developed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: fix double accounting of flush with dataMike Snitzer
DM handles a flush with data by first issuing an empty flush and then once it completes the REQ_PREFLUSH flag is removed and the payload is issued. The problem fixed by this commit is that both the empty flush bio and the data payload will account the full extent of the data payload. Fix this by factoring out dm_io_acct() and having it wrap all IO accounting to set the size of bio with REQ_PREFLUSH to 0, account the IO, and then restore the original size. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21dm: interlock pending dm_io and dm_wait_for_bios_completionMike Snitzer
Commit d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO") didn't go far enough. When bio_end_io_acct ends the count of in-flight I/Os may reach zero and the DM device may be suspended. There is a possibility that the suspend races with dm_stats_account_io. Fix this by adding percpu "pending_io" counters to track outstanding dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also, rename md_in_flight_bios() to dm_in_flight_bios() and update it to iterate all pending_io counters. Fixes: d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO") Cc: stable@vger.kernel.org Co-developed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-17block: fix surprise removal for drivers calling blk_set_queue_dyingChristoph Hellwig
Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit 8e141f9eb803 that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04block: pass a block_device to bio_clone_fastChristoph Hellwig
Pass a block_device to bio_clone_fast and __bio_clone_fast and give the functions more suitable names. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: use bio_clone_fast in alloc_io/alloc_tioChristoph Hellwig
Replace open coded bio_clone_fast implementations with the actual helper. Note that the bio allocated as part of the dm_io structure in alloc_io will only actually be used later in alloc_tio, making this earlier cloning of the information safe. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04block: clone crypto and integrity data in __bio_clone_fastChristoph Hellwig
__bio_clone_fast should also clone integrity and crypto data, as a clone without those is incomplete. Right now the only caller that can actually support crypto and integrity data (dm) does it manually for the one callchain that supports these, but we better do it properly in the core. Note that all callers except for the above mentioned one also don't need to handle failure at all, given that the integrity and crypto clones are based on mempool allocations that won't fail for sleeping allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: simplify the single bio fast path in __send_duplicate_biosChristoph Hellwig
Most targets just need a single flush bio. Open code that case in __send_duplicate_bios without the need to add the bio to a list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: retun the clone bio from alloc_tioChristoph Hellwig
Return the clone bio embedded into the tio as that is what the callers actually want. Similar for the free side. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: pass the bio instead of tio to __map_bioChristoph Hellwig
This simplifies the callers a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: move cloning the bio into alloc_tioChristoph Hellwig
Move the call to __bio_clone_fast and the assignment of ->len_ptr from the callers into alloc_tio to prepare for changes to the bio clone API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: fold __send_duplicate_bios into __clone_and_map_simple_bioChristoph Hellwig
Fold __send_duplicate_bios into its only caller to prepare for refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: fold clone_bio into __clone_and_map_data_bioChristoph Hellwig
Fold clone_bio into its only caller to prepare for refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04dm: add a clone_to_tio helperChristoph Hellwig
Add a helper to stop open coding the container_of operations to get from the clone bio to the tio structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: pass a block_device and opf to bio_initChristoph Hellwig
Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: pass a block_device and opf to bio_alloc_biosetChristoph Hellwig
Pass the block_device and operation that we plan to use this bio for to bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02dm: bio_alloc can't fail if it is allowed to sleepChristoph Hellwig
Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-28dm: properly fix redundant bio-based IO accountingMike Snitzer
Record the start_time for a bio but defer the starting block core's IO accounting until after IO is submitted using bio_start_io_acct_time(). This approach avoids the need to mess around with any of the individual IO stats in response to a bio_split() that follows bio submission. Reported-by: Bud Brown <bubrown@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Depends-on: e45c47d1f94e ("block: add bio_start_io_acct_time() to control start_time") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-28dm: revert partial fix for redundant bio-based IO accountingMike Snitzer
Reverts a1e1cb72d9649 ("dm: fix redundant IO accounting for bios that need splitting") because it was too narrow in scope (only addressed redundant 'sectors[]' accounting and not ios, nsecs[], etc). Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-12Merge tag 'libnvdimm-for-5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax and libnvdimm updates from Dan Williams: "The bulk of this is a rework of the dax_operations API after discovering the obstacles it posed to the work-in-progress DAX+reflink support for XFS and other copy-on-write filesystem mechanics. Primarily the need to plumb a block_device through the API to handle partition offsets was a sticking point and Christoph untangled that dependency in addition to other cleanups to make landing the DAX+reflink support easier. The DAX_PMEM_COMPAT option has been around for 4 years and not only are distributions shipping userspace that understand the current configuration API, but some are not even bothering to turn this option on anymore, so it seems a good time to remove it per the deprecation schedule. Recall that this was added after the device-dax subsystem moved from /sys/class/dax to /sys/bus/dax for its sysfs organization. All recent functionality depends on /sys/bus/dax. Some other miscellaneous cleanups and reflink prep patches are included as well. Summary: - Simplify the dax_operations API: - Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining and applying a partition offset to all its DAX iomap operations. - Remove wrappers and device-mapper stacked callbacks for ->copy_from_iter() and ->copy_to_iter() in favor of moving block_device relative offset responsibility to the dax_direct_access() caller. - Remove the need for an @bdev in filesystem-DAX infrastructure - Remove unused uio helpers copy_from_iter_flushcache() and copy_mc_to_iter() as only the non-check_copy_size() versions are used for DAX. - Prepare XFS for the pending (next merge window) DAX+reflink support - Remove deprecated DEV_DAX_PMEM_COMPAT support - Cleanup a straggling misuse of the GUID api" * tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits) iomap: Fix error handling in iomap_zero_iter() ACPI: NFIT: Import GUID before use dax: remove the copy_from_iter and copy_to_iter methods dax: remove the DAXDEV_F_SYNC flag dax: simplify dax_synchronous and set_dax_synchronous uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() iomap: turn the byte variable in iomap_zero_iter into a ssize_t memremap: remove support for external pgmap refcounts fsdax: don't require CONFIG_BLOCK iomap: build the block based code conditionally dax: fix up some of the block device related ifdefs fsdax: shift partition offset handling into the file systems dax: return the partition offset from fs_dax_get_by_bdev iomap: add a IOMAP_DAX flag xfs: pass the mapping flags to xfs_bmbt_to_iomap xfs: use xfs_direct_write_iomap_ops for DAX zeroing xfs: move dax device handling into xfs_{alloc,free}_buftarg ext4: cleanup the dax handling in ext4_fill_super ext2: cleanup the dax handling in ext2_fill_super fsdax: decouple zeroing from the iomap buffered I/O code ...
2021-12-18dax: remove the copy_from_iter and copy_to_iter methodsChristoph Hellwig
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: remove the DAXDEV_F_SYNC flagChristoph Hellwig
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: return the partition offset from fs_dax_get_by_bdevChristoph Hellwig
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove dax_capableChristoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: simplify the dax_device <-> gendisk associationChristoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm: make the DAX support depend on CONFIG_FS_DAXChristoph Hellwig
The device mapper DAX support is all hanging off a block device and thus can't be used with device dax. Make it depend on CONFIG_FS_DAX instead of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to be built under CONFIG_FS_DAX now. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm: fix alloc_dax error handling in alloc_devChristoph Hellwig
Make sure ->dax_dev is NULL on error so that the cleanup path doesn't trip over an ERR_PTR. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211129102203.2243509-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-29block: remove GENHD_FL_EXT_DEVTChristoph Hellwig
All modern drivers can support extra partitions using the extended dev_t. In fact except for the ioctl method drivers never even see partitions in normal operation. So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all block devices that do support partitions, and require those that do not support partitions to explicit disallow them using GENHD_FL_NO_PART. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-09Merge tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Set of fixes for the batched tag allocation (Ming, me) - add_disk() error handling fix (Luis) - Nested queue quiesce fixes (Ming) - Shared tags init error handling fix (Ye) - Misc cleanups (Jean, Ming, me) * tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block: nvme: wait until quiesce is done scsi: make sure that request queue queiesce and unquiesce balanced scsi: avoid to quiesce sdev->request_queue two times blk-mq: add one API for waiting until quiesce is done blk-mq: don't free tags if the tag_set is used by other device in queue initialztion block: fix device_add_disk() kobject_create_and_add() error handling block: ensure cached plug request matches the current queue block: move queue enter logic into blk_mq_submit_bio() block: make bio_queue_enter() fast-path available inline block: split request allocation components into helpers block: have plug stored requests hold references to the queue blk-mq: update hctx->nr_active in blk_mq_end_request_batch() blk-mq: add RQF_ELV debug entry blk-mq: only try to run plug merge if request has same queue with incoming bio block: move RQF_ELV setting into allocators dm: don't stop request queue after the dm device is suspended block: replace always false argument with 'false' block: assign correct tag before doing prefetch of request blk-mq: fix redundant check of !e expression
2021-11-09Merge tag 'for-5.16/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM core support for emitting audit events through the audit subsystem. Also enhance both the integrity and crypt targets to emit events to via dm-audit. - Various other simple code improvements and cleanups. * tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: log table creation error code dm: make workqueue names device-specific dm writecache: Make use of the helper macro kthread_run() dm crypt: Make use of the helper macro kthread_run() dm verity: use bvec_kmap_local in verity_for_bv_block dm log writes: use memcpy_from_bvec in log_writes_map dm integrity: use bvec_kmap_local in __journal_read_write dm integrity: use bvec_kmap_local in integrity_metadata dm: add add_disk() error handling dm: Remove redundant flush_workqueue() calls dm crypt: log aead integrity violations to audit subsystem dm integrity: log audit events for dm-integrity target dm: introduce audit event module for device mapper
2021-11-02dm: don't stop request queue after the dm device is suspendedMing Lei
For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two call to be balanced. __bind() is only called from dm_swap_table() in which dm device has been suspended already, so not necessary to stop queue again. With this way, request queue quiesce and unquiesce can be balanced. Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Link: https://lore.kernel.org/r/20211021145918.2691762-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-01dm: make workqueue names device-specificMichał Mirosław
Add device number to kdmflush workqueue name to help debugging CPU usage. Resulting `ps axfu` snippet: root 3791 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:7] root 3792 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd_io/253:7] root 3793 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd/253:7] root 3794 0.0 0.0 0 0 ? S paź19 0:00 \_ [dmcrypt_write/253:7] root 3814 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:8] root 3815 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:9] root 3816 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:10] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm: add add_disk() error handlingLuis Chamberlain
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. There are two calls to dm_setup_md_queue() which can fail then, one on dm_early_create() and we can easily see that the error path there calls dm_destroy in the error path. The other use case is on the ioctl table_load case. If that fails userspace needs to call the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other failure. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...
2021-10-21blk-crypto: rename blk_keyslot_manager to blk_crypto_profileEric Biggers
blk_keyslot_manager is misnamed because it doesn't necessarily manage keyslots. It actually does several different things: - Contains the crypto capabilities of the device. - Provides functions to control the inline encryption hardware. Originally these were just for programming/evicting keyslots; however, new functionality (hardware-wrapped keys) will require new functions here which are unrelated to keyslots. Moreover, device-mapper devices already (ab)use "keyslot_evict" to pass key eviction requests to their underlying devices even though device-mapper devices don't have any keyslots themselves (so it really should be "evict_key", not "keyslot_evict"). - Sometimes (but not always!) it manages keyslots. Originally it always did, but device-mapper devices don't have keyslots themselves, so they use a "passthrough keyslot manager" which doesn't actually manage keyslots. This hack works, but the terminology is unnatural. Also, some hardware doesn't have keyslots and thus also uses a "passthrough keyslot manager" (support for such hardware is yet to be upstreamed, but it will happen eventually). Let's stop having keyslot managers which don't actually manage keyslots. Instead, rename blk_keyslot_manager to blk_crypto_profile. This is a fairly big change, since for consistency it also has to update keyslot manager-related function names, variable names, and comments -- not just the actual struct name. However it's still a fairly straightforward change, as it doesn't change any actual functionality. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-4-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21blk-crypto: rename keyslot-manager files to blk-crypto-profileEric Biggers
In preparation for renaming struct blk_keyslot_manager to struct blk_crypto_profile, rename the keyslot-manager.h and keyslot-manager.c source files. Renaming these files separately before making a lot of changes to their contents makes it easier for git to understand that they were renamed. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-3-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21dm: add add_disk() error handlingLuis Chamberlain
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. There are two calls to dm_setup_md_queue() which can fail then, one on dm_early_create() and we can easily see that the error path there calls dm_destroy in the error path. The other use case is on the ioctl table_load case. If that fails userspace needs to call the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other failure. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211015233028.2167651-4-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-12dm: fix mempool NULL pointer race when completing IOJiazi Li
dm_io_dec_pending() calls end_io_acct() first and will then dec md in-flight pending count. But if a task is swapping DM table at same time this can result in a crash due to mempool->elements being NULL: task1 task2 do_resume ->do_suspend ->dm_wait_for_completion bio_endio ->clone_endio ->dm_io_dec_pending ->end_io_acct ->wakeup task1 ->dm_swap_table ->__bind ->__bind_mempools ->bioset_exit ->mempool_exit ->free_io [ 67.330330] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ...... [ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO) [ 67.330510] pc : mempool_free+0x70/0xa0 [ 67.330515] lr : mempool_free+0x4c/0xa0 [ 67.330520] sp : ffffff8008013b20 [ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004 [ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8 [ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800 [ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800 [ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80 [ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c [ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd [ 67.330563] x15: 000000000093b41e x14: 0000000000000010 [ 67.330569] x13: 0000000000007f7a x12: 0000000034155555 [ 67.330574] x11: 0000000000000001 x10: 0000000000000001 [ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000 [ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a [ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001 [ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8 [ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970 [ 67.330609] Call trace: [ 67.330616] mempool_free+0x70/0xa0 [ 67.330627] bio_put+0xf8/0x110 [ 67.330638] dec_pending+0x13c/0x230 [ 67.330644] clone_endio+0x90/0x180 [ 67.330649] bio_endio+0x198/0x1b8 [ 67.330655] dec_pending+0x190/0x230 [ 67.330660] clone_endio+0x90/0x180 [ 67.330665] bio_endio+0x198/0x1b8 [ 67.330673] blk_update_request+0x214/0x428 [ 67.330683] scsi_end_request+0x2c/0x300 [ 67.330688] scsi_io_completion+0xa0/0x710 [ 67.330695] scsi_finish_command+0xd8/0x110 [ 67.330700] scsi_softirq_done+0x114/0x148 [ 67.330708] blk_done_softirq+0x74/0xd0 [ 67.330716] __do_softirq+0x18c/0x374 [ 67.330724] irq_exit+0xb4/0xb8 [ 67.330732] __handle_domain_irq+0x84/0xc0 [ 67.330737] gic_handle_irq+0x148/0x1b0 [ 67.330744] el1_irq+0xe8/0x190 [ 67.330753] lpm_cpuidle_enter+0x4f8/0x538 [ 67.330759] cpuidle_enter_state+0x1fc/0x398 [ 67.330764] cpuidle_enter+0x18/0x20 [ 67.330772] do_idle+0x1b4/0x290 [ 67.330778] cpu_startup_entry+0x20/0x28 [ 67.330786] secondary_start_kernel+0x160/0x170 Fix this by: 1) Establishing pointers to 'struct dm_io' members in dm_io_dec_pending() so that they may be passed into end_io_acct() _after_ free_io() is called. 2) Moving end_io_acct() after free_io(). Cc: stable@vger.kernel.org Signed-off-by: Jiazi Li <lijiazi@xiaomi.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-09-09Merge tag 'libnvdimm-for-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Fix a race condition in the teardown path of raw mode pmem namespaces. - Cleanup the code that filesystems use to detect filesystem-dax capabilities of their underlying block device. * tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: remove bdev_dax_supported xfs: factor out a xfs_buftarg_is_dax helper dax: stub out dax_supported for !CONFIG_FS_DAX dax: remove __generic_fsdax_supported dax: move the dax_read_lock() locking into dax_supported dax: mark dax_get_by_host static dm: use fs_dax_get_by_bdev instead of dax_get_by_host dax: stop using bdevname fsdax: improve the FS_DAX Kconfig description and help text libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
2021-08-26dm: use fs_dax_get_by_bdev instead of dax_get_by_hostChristoph Hellwig
There is no point in trying to finding the dax device if the DAX flag is not set on the queue as none of the users of the device mapper exported block devices could make use of the DAX capability. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210826135510.6293-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-20dm ima: add a warning in dm_init if duplicate ima events are not measuredTushar Sugandhi
The end-users of DM devices/targets may remove and re-create the same device multiple times. IMA does not measure such duplicate events if the configuration CONFIG_IMA_DISABLE_HTABLE is set to 'n'. To avoid confusion, the end-users need some indication on the client if that configuration option is disabled. Add a one-time warning during dm_init() if CONFIG_IMA_DISABLE_HTABLE is set to 'n', to notify the end-users that duplicate events will not be measured in the ima log. Also cleanup some whitespace in dm_init(). Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>