aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Documentation/errseq.rst149
-rw-r--r--arch/powerpc/platforms/cell/spufs/file.c2
-rw-r--r--drivers/staging/lustre/lustre/llite/file.c2
-rw-r--r--drivers/video/fbdev/core/fb_defio.c2
-rw-r--r--fs/9p/vfs_file.c4
-rw-r--r--fs/affs/file.c2
-rw-r--r--fs/afs/write.c2
-rw-r--r--fs/cifs/file.c4
-rw-r--r--fs/ecryptfs/file.c2
-rw-r--r--fs/exofs/file.c2
-rw-r--r--fs/f2fs/file.c2
-rw-r--r--fs/fuse/file.c6
-rw-r--r--fs/gfs2/file.c6
-rw-r--r--fs/hfs/inode.c2
-rw-r--r--fs/hfsplus/inode.c2
-rw-r--r--fs/hostfs/hostfs_kern.c2
-rw-r--r--fs/hpfs/file.c2
-rw-r--r--fs/jffs2/file.c2
-rw-r--r--fs/jfs/file.c2
-rw-r--r--fs/ncpfs/file.c2
-rw-r--r--fs/ntfs/dir.c2
-rw-r--r--fs/ntfs/file.c2
-rw-r--r--fs/ocfs2/file.c2
-rw-r--r--fs/reiserfs/dir.c2
-rw-r--r--fs/reiserfs/file.c2
-rw-r--r--fs/sync.c4
-rw-r--r--fs/ubifs/file.c2
-rw-r--r--include/linux/errseq.h14
-rw-r--r--include/linux/fs.h20
-rw-r--r--lib/errseq.c17
-rw-r--r--mm/filemap.c64
31 files changed, 241 insertions, 89 deletions
diff --git a/Documentation/errseq.rst b/Documentation/errseq.rst
new file mode 100644
index 000000000000..4c29bd5afbc5
--- /dev/null
+++ b/Documentation/errseq.rst
@@ -0,0 +1,149 @@
+The errseq_t datatype
+=====================
+An errseq_t is a way of recording errors in one place, and allowing any
+number of "subscribers" to tell whether it has changed since a previous
+point where it was sampled.
+
+The initial use case for this is tracking errors for file
+synchronization syscalls (fsync, fdatasync, msync and sync_file_range),
+but it may be usable in other situations.
+
+It's implemented as an unsigned 32-bit value. The low order bits are
+designated to hold an error code (between 1 and MAX_ERRNO). The upper bits
+are used as a counter. This is done with atomics instead of locking so that
+these functions can be called from any context.
+
+Note that there is a risk of collisions if new errors are being recorded
+frequently, since we have so few bits to use as a counter.
+
+To mitigate this, the bit between the error value and counter is used as
+a flag to tell whether the value has been sampled since a new value was
+recorded. That allows us to avoid bumping the counter if no one has
+sampled it since the last time an error was recorded.
+
+Thus we end up with a value that looks something like this::
+
+ bit: 31..13 12 11..0
+ +-----------------+----+----------------+
+ | counter | SF | errno |
+ +-----------------+----+----------------+
+
+The general idea is for "watchers" to sample an errseq_t value and keep
+it as a running cursor. That value can later be used to tell whether
+any new errors have occurred since that sampling was done, and atomically
+record the state at the time that it was checked. This allows us to
+record errors in one place, and then have a number of "watchers" that
+can tell whether the value has changed since they last checked it.
+
+A new errseq_t should always be zeroed out. An errseq_t value of all zeroes
+is the special (but common) case where there has never been an error. An all
+zero value thus serves as the "epoch" if one wishes to know whether there
+has ever been an error set since it was first initialized.
+
+API usage
+=========
+Let me tell you a story about a worker drone. Now, he's a good worker
+overall, but the company is a little...management heavy. He has to
+report to 77 supervisors today, and tomorrow the "big boss" is coming in
+from out of town and he's sure to test the poor fellow too.
+
+They're all handing him work to do -- so much he can't keep track of who
+handed him what, but that's not really a big problem. The supervisors
+just want to know when he's finished all of the work they've handed him so
+far and whether he made any mistakes since they last asked.
+
+He might have made the mistake on work they didn't actually hand him,
+but he can't keep track of things at that level of detail, all he can
+remember is the most recent mistake that he made.
+
+Here's our worker_drone representation::
+
+ struct worker_drone {
+ errseq_t wd_err; /* for recording errors */
+ };
+
+Every day, the worker_drone starts out with a blank slate::
+
+ struct worker_drone wd;
+
+ wd.wd_err = (errseq_t)0;
+
+The supervisors come in and get an initial read for the day. They
+don't care about anything that happened before their watch begins::
+
+ struct supervisor {
+ errseq_t s_wd_err; /* private "cursor" for wd_err */
+ spinlock_t s_wd_err_lock; /* protects s_wd_err */
+ }
+
+ struct supervisor su;
+
+ su.s_wd_err = errseq_sample(&wd.wd_err);
+ spin_lock_init(&su.s_wd_err_lock);
+
+Now they start handing him tasks to do. Every few minutes they ask him to
+finish up all of the work they've handed him so far. Then they ask him
+whether he made any mistakes on any of it::
+
+ spin_lock(&su.su_wd_err_lock);
+ err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
+ spin_unlock(&su.su_wd_err_lock);
+
+Up to this point, that just keeps returning 0.
+
+Now, the owners of this company are quite miserly and have given him
+substandard equipment with which to do his job. Occasionally it
+glitches and he makes a mistake. He sighs a heavy sigh, and marks it
+down::
+
+ errseq_set(&wd.wd_err, -EIO);
+
+...and then gets back to work. The supervisors eventually poll again
+and they each get the error when they next check. Subsequent calls will
+return 0, until another error is recorded, at which point it's reported
+to each of them once.
+
+Note that the supervisors can't tell how many mistakes he made, only
+whether one was made since they last checked, and the latest value
+recorded.
+
+Occasionally the big boss comes in for a spot check and asks the worker
+to do a one-off job for him. He's not really watching the worker
+full-time like the supervisors, but he does need to know whether a
+mistake occurred while his job was processing.
+
+He can just sample the current errseq_t in the worker, and then use that
+to tell whether an error has occurred later::
+
+ errseq_t since = errseq_sample(&wd.wd_err);
+ /* submit some work and wait for it to complete */
+ err = errseq_check(&wd.wd_err, since);
+
+Since he's just going to discard "since" after that point, he doesn't
+need to advance it here. He also doesn't need any locking since it's
+not usable by anyone else.
+
+Serializing errseq_t cursor updates
+===================================
+Note that the errseq_t API does not protect the errseq_t cursor during a
+check_and_advance_operation. Only the canonical error code is handled
+atomically. In a situation where more than one task might be using the
+same errseq_t cursor at the same time, it's important to serialize
+updates to that cursor.
+
+If that's not done, then it's possible for the cursor to go backward
+in which case the same error could be reported more than once.
+
+Because of this, it's often advantageous to first do an errseq_check to
+see if anything has changed, and only later do an
+errseq_check_and_advance after taking the lock. e.g.::
+
+ if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) {
+ /* su.s_wd_err is protected by s_wd_err_lock */
+ spin_lock(&su.s_wd_err_lock);
+ err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
+ spin_unlock(&su.s_wd_err_lock);
+ }
+
+That avoids the spinlock in the common case where nothing has changed
+since the last time it was checked.
diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index ae2f740a82f1..5ffcdeb1eb17 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -1749,7 +1749,7 @@ out:
static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
struct inode *inode = file_inode(file);
- int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ int err = file_write_and_wait_range(file, start, end);
if (!err) {
inode_lock(inode);
err = spufs_mfc_flush(file, NULL);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 2c30e422a47e..be665454f407 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2364,7 +2364,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync)
PFID(ll_inode2fid(inode)), inode);
ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1);
- rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ rc = file_write_and_wait_range(file, start, end);
inode_lock(inode);
/* catch async errors that were recorded back when async writeback
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 37f69c061210..487d5e336e1b 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -69,7 +69,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, loff_t end, int datasy
{
struct fb_info *info = file->private_data;
struct inode *inode = file_inode(file);
- int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ int err = file_write_and_wait_range(file, start, end);
if (err)
return err;
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 43c242e17132..03c9e325bfbc 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -445,7 +445,7 @@ static int v9fs_file_fsync(struct file *filp, loff_t start, loff_t end,
struct p9_wstat wstat;
int retval;
- retval = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ retval = file_write_and_wait_range(filp, start, end);
if (retval)
return retval;
@@ -468,7 +468,7 @@ int v9fs_file_fsync_dotl(struct file *filp, loff_t start, loff_t end,
struct inode *inode = filp->f_mapping->host;
int retval;
- retval = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ retval = file_write_and_wait_range(filp, start, end);
if (retval)
return retval;
diff --git a/fs/affs/file.c b/fs/affs/file.c
index 196ee7f6fdc4..00331810f690 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -954,7 +954,7 @@ int affs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
struct inode *inode = filp->f_mapping->host;
int ret, err;
- err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ err = file_write_and_wait_range(filp, start, end);
if (err)
return err;
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 2d2fccd5044b..106e43db1115 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -714,7 +714,7 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
vnode->fid.vid, vnode->fid.vnode, file,
datasync);
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ ret = file_write_and_wait_range(file, start, end);
if (ret)
return ret;
inode_lock(inode);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index bc09df6b473a..0786f19d288f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2329,7 +2329,7 @@ int cifs_strict_fsync(struct file *file, loff_t start, loff_t end,
struct inode *inode = file_inode(file);
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
- rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ rc = file_write_and_wait_range(file, start, end);
if (rc)
return rc;
inode_lock(inode);
@@ -2371,7 +2371,7 @@ int cifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file);
struct inode *inode = file->f_mapping->host;
- rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ rc = file_write_and_wait_range(file, start, end);
if (rc)
return rc;
inode_lock(inode);
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index ca4e83750214..c74ed3ca3372 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -328,7 +328,7 @@ ecryptfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
int rc;
- rc = filemap_write_and_wait(file->f_mapping);
+ rc = file_write_and_wait(file);
if (rc)
return rc;
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index 28645f0640f7..a94594ea2aa3 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -48,7 +48,7 @@ static int exofs_file_fsync(struct file *filp, loff_t start, loff_t end,
struct inode *inode = filp->f_mapping->host;
int ret;
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ ret = file_write_and_wait_range(filp, start, end);
if (ret)
return ret;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 2706130c261b..843a0d99f7ea 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -206,7 +206,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
/* if fdatasync is triggered, let's do in-place-update */
if (datasync || get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
set_inode_flag(inode, FI_NEED_IPU);
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ ret = file_write_and_wait_range(file, start, end);
clear_inode_flag(inode, FI_NEED_IPU);
if (ret) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 0273029b1220..d66789804287 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -457,7 +457,7 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
* wait for all outstanding writes, before sending the FSYNC
* request.
*/
- err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ err = file_write_and_wait_range(file, start, end);
if (err)
goto out;
@@ -465,10 +465,10 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
/*
* Due to implementation of fuse writeback
- * filemap_write_and_wait_range() does not catch errors.
+ * file_write_and_wait_range() does not catch errors.
* We have to do this directly after fuse_sync_writes()
*/
- err = filemap_check_errors(file->f_mapping);
+ err = file_check_and_advance_wb_err(file);
if (err)
goto out;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index bb48074be019..33a0cb5701a3 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
if (ret)
return ret;
if (gfs2_is_jdata(ip))
- filemap_write_and_wait(mapping);
+ ret = file_write_and_wait(file);
+ if (ret)
+ return ret;
gfs2_ail_flush(ip->i_gl, 1);
}
if (mapping->nrpages)
- ret = filemap_fdatawait_range(mapping, start, end);
+ ret = file_fdatawait_range(file, start, end);
return ret ? ret : ret1;
}
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index bfbba799430f..2538b49cc349 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -656,7 +656,7 @@ static int hfs_file_fsync(struct file *filp, loff_t start, loff_t end,
struct super_block * sb;
int ret, err;
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ ret = file_write_and_wait_range(filp, start, end);
if (ret)
return ret;
inode_lock(inode);
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index e8638d528195..4f26b6877130 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -283,7 +283,7 @@ int hfsplus_file_fsync(struct file *file, loff_t start, loff_t end,
struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
int error = 0, error2;
- error = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ error = file_write_and_wait_range(file, start, end);
if (error)
return error;
inode_lock(inode);
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index e61261a7417e..c148e7f4f451 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -374,7 +374,7 @@ static int hostfs_fsync(struct file *file, loff_t start, loff_t end,
struct inode *inode = file->f_mapping->host;
int ret;
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ ret = file_write_and_wait_range(file, start, end);
if (ret)
return ret;
diff --git a/fs/hpfs/file.c b/fs/hpfs/file.c
index b3be1b5a62e2..f26138425b16 100644
--- a/fs/hpfs/file.c
+++ b/fs/hpfs/file.c
@@ -24,7 +24,7 @@ int hpfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
struct inode *inode = file->f_mapping->host;
int ret;
- ret = filemap_write_and_wait_range(file->f_mapping, start, end);
+ ret = file_write_and_wait_range(file, start, end);
if (ret)
return ret;
return sync_blockdev(inode->i_sb->s_bdev);
diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index c12476e309c6..bd0428bebe9b 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -35,7 +35,7 @@ int jffs2_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
int ret;
- ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ ret = file_write_and_wait_range(filp, start, end);
if (ret)
return ret;
diff --git a/fs/jfs/file.c b/fs/jfs/file.c
index 739492c7a3fd..36665fd37095 100644
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@@ -34,7 +34,7 @@ int jfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
struct inode *inode = file->f_mapping->host;
int rc = 0;
- rc = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ rc = file_write_and_wait_range(file, start, end);
if (rc)
return rc;
diff --git a/fs/ncpfs/file.c b/fs/ncpfs/file.c
index 76965e772264..a06c07619ee6 100644
--- a/fs/ncpfs/file.c
+++ b/fs/ncpfs/file.c
@@ -23,7 +23,7 @@
static int ncp_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
- return filemap_write_and_wait_range(file->f_mapping, start, end);
+ return file_write_and_wait_range(file, start, end);
}
/*
diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c
index 0ee19ecc982d..1a24be9e8405 100644
--- a/fs/ntfs/dir.c
+++ b/fs/ntfs/dir.c
@@ -1506,7 +1506,7 @@ static int ntfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
- err = filemap_write_and_wait_range(vi->i_mapping, start, end);
+ err = file_write_and_wait_range(filp, start, end);
if (err)
return err;
inode_lock(vi);
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index c4f68c338735..331910fa8442 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -1989,7 +1989,7 @@ static int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end,
ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
- err = filemap_write_and_wait_range(vi->i_mapping, start, end);
+ err = file_write_and_wait_range(filp, start, end);
if (err)
return err;
inode_lock(vi);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index bfeb647459d9..66e59d3163ea 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -196,7 +196,7 @@ static int ocfs2_sync_file(struct file *file, loff_t start, loff_t end,
if (ocfs2_is_hard_readonly(osb) || ocfs2_is_soft_readonly(osb))
return -EROFS;
- err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ err = file_write_and_wait_range(file, start, end);
if (err)
return err;
diff --git a/fs/reiserfs/dir.c b/fs/reiserfs/dir.c
index 45aa05e2232f..5b50689d8539 100644
--- a/fs/reiserfs/dir.c
+++ b/fs/reiserfs/dir.c
@@ -34,7 +34,7 @@ static int reiserfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
struct inode *inode = filp->f_mapping->host;
int err;
- err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ err = file_write_and_wait_range(filp, start, end);
if (err)
return err;
diff --git a/fs/reiserfs/file.c b/fs/reiserfs/file.c
index b396eb09f288..843aadcc123c 100644
--- a/fs/reiserfs/file.c
+++ b/fs/reiserfs/file.c
@@ -154,7 +154,7 @@ static int reiserfs_sync_file(struct file *filp, loff_t start, loff_t end,
int err;
int barrier_done;
- err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ err = file_write_and_wait_range(filp, start, end);
if (err)
return err;
diff --git a/fs/sync.c b/fs/sync.c
index 2a54c1f22035..27d6b8bbcb6a 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
ret = 0;
if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
- ret = filemap_fdatawait_range(mapping, offset, endbyte);
+ ret = file_fdatawait_range(f.file, offset, endbyte);
if (ret < 0)
goto out_put;
}
@@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
}
if (flags & SYNC_FILE_RANGE_WAIT_AFTER)
- ret = filemap_fdatawait_range(mapping, offset, endbyte);
+ ret = file_fdatawait_range(f.file, offset, endbyte);
out_put:
fdput(f);
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 8cad0b19b404..f90a466ea5db 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1337,7 +1337,7 @@ int ubifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
*/
return 0;
- err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ err = file_write_and_wait_range(file, start, end);
if (err)
return err;
inode_lock(inode);
diff --git a/include/linux/errseq.h b/include/linux/errseq.h
index 9e0d444ac88d..f746bd8fe4d0 100644
--- a/include/linux/errseq.h
+++ b/include/linux/errseq.h
@@ -1,18 +1,12 @@
+/*
+ * See Documentation/errseq.rst and lib/errseq.c
+ */
#ifndef _LINUX_ERRSEQ_H
#define _LINUX_ERRSEQ_H
-/* See lib/errseq.c for more info */
-
typedef u32 errseq_t;
-errseq_t __errseq_set(errseq_t *eseq, int err);
-static inline void errseq_set(errseq_t *eseq, int err)
-{
- /* Optimize for the common case of no error */
- if (unlikely(err))
- __errseq_set(eseq, err);
-}
-
+errseq_t errseq_set(errseq_t *eseq, int err);
errseq_t errseq_sample(errseq_t *eseq);
int errseq_check(errseq_t *eseq, errseq_t since);
int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0a51a8b197f7..5b744a3456c5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2544,12 +2544,19 @@ extern int invalidate_inode_pages2_range(struct address_space *mapping,
extern int write_inode_now(struct inode *, int);
extern int filemap_fdatawrite(struct address_space *);
extern int filemap_flush(struct address_space *);
-extern int filemap_fdatawait(struct address_space *);
extern int filemap_fdatawait_keep_errors(struct address_space *mapping);
extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
loff_t lend);
+
+static inline int filemap_fdatawait(struct address_space *mapping)
+{
+ return filemap_fdatawait_range(mapping, 0, LLONG_MAX);
+}
+
extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
loff_t lend);
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+ loff_t lend);
extern int filemap_write_and_wait(struct address_space *mapping);
extern int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
@@ -2558,12 +2565,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
extern int filemap_fdatawrite_range(struct address_space *mapping,
loff_t start, loff_t end);
extern int filemap_check_errors(struct address_space *mapping);
-
extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+ loff_t lend);
extern int __must_check file_check_and_advance_wb_err(struct file *file);
extern int __must_check file_write_and_wait_range(struct file *file,
loff_t start, loff_t end);
+static inline int file_write_and_wait(struct file *file)
+{
+ return file_write_and_wait_range(file, 0, LLONG_MAX);
+}
+
/**
* filemap_set_wb_err - set a writeback error on an address_space
* @mapping: mapping in which to set writeback error
@@ -2577,8 +2591,6 @@ extern int __must_check file_write_and_wait_range(struct file *file,
* When a writeback error occurs, most filesystems will want to call
* filemap_set_wb_err to record the error in the mapping so that it will be
* automatically reported whenever fsync is called on the file.
- *
- * FIXME: mention FS_* flag here?
*/
static inline void filemap_set_wb_err(struct address_space *mapping, int err)
{
diff --git a/lib/errseq.c b/lib/errseq.c
index 841fa24e6e00..7b900c2a277a 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -41,23 +41,20 @@
#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1))
/**
- * __errseq_set - set a errseq_t for later reporting
+ * errseq_set - set a errseq_t for later reporting
* @eseq: errseq_t field that should be set
- * @err: error to set
+ * @err: error to set (must be between -1 and -MAX_ERRNO)
*
* This function sets the error in *eseq, and increments the sequence counter
* if the last sequence was sampled at some point in the past.
*
* Any error set will always overwrite an existing error.
*
- * Most callers will want to use the errseq_set inline wrapper to efficiently
- * handle the common case where err is 0.
- *
- * We do return an errseq_t here, primarily for debugging purposes. The return
- * value should not be used as a previously sampled value in later calls as it
- * will not have the SEEN flag set.
+ * We do return the latest value here, primarily for debugging purposes. The
+ * return value should not be used as a previously sampled value in later calls
+ * as it will not have the SEEN flag set.
*/
-errseq_t __errseq_set(errseq_t *eseq, int err)
+errseq_t errseq_set(errseq_t *eseq, int err)
{
errseq_t cur, old;
@@ -107,7 +104,7 @@ errseq_t __errseq_set(errseq_t *eseq, int err)
}
return cur;
}
-EXPORT_SYMBOL(__errseq_set);
+EXPORT_SYMBOL(errseq_set);
/**
* errseq_sample - grab current errseq_t value
diff --git a/mm/filemap.c b/mm/filemap.c
index 65b4b6e7f7bd..1e01cb6e5173 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
EXPORT_SYMBOL(filemap_fdatawait_range);
/**
+ * file_fdatawait_range - wait for writeback to complete
+ * @file: file pointing to address space structure to wait for
+ * @start_byte: offset in bytes where the range starts
+ * @end_byte: offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the address space that file
+ * refers to, in the given range and wait for all of them. Check error
+ * status of the address space vs. the file->f_wb_err cursor and return it.
+ *
+ * Since the error status of the file is advanced by this function,
+ * callers are responsible for checking the return value and handling and/or
+ * reporting the error.
+ */
+int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
+{
+ struct address_space *mapping = file->f_mapping;
+
+ __filemap_fdatawait_range(mapping, start_byte, end_byte);
+ return file_check_and_advance_wb_err(file);
+}
+EXPORT_SYMBOL(file_fdatawait_range);
+
+/**
* filemap_fdatawait_keep_errors - wait for writeback without clearing errors
* @mapping: address space structure to wait for
*
@@ -489,45 +512,22 @@ EXPORT_SYMBOL(filemap_fdatawait_range);
*/
int filemap_fdatawait_keep_errors(struct address_space *mapping)
{
- loff_t i_size = i_size_read(mapping->host);
-
- if (i_size == 0)
- return 0;
-
- __filemap_fdatawait_range(mapping, 0, i_size - 1);
+ __filemap_fdatawait_range(mapping, 0, LLONG_MAX);
return filemap_check_and_keep_errors(mapping);
}
EXPORT_SYMBOL(filemap_fdatawait_keep_errors);
-/**
- * filemap_fdatawait - wait for all under-writeback pages to complete
- * @mapping: address space structure to wait for
- *
- * Walk the list of under-writeback pages of the given address space
- * and wait for all of them. Check error status of the address space
- * and return it.
- *
- * Since the error status of the address space is cleared by this function,
- * callers are responsible for checking the return value and handling and/or
- * reporting the error.
- */
-int filemap_fdatawait(struct address_space *mapping)
+static bool mapping_needs_writeback(struct address_space *mapping)
{
- loff_t i_size = i_size_read(mapping->host);
-
- if (i_size == 0)
- return 0;
-
- return filemap_fdatawait_range(mapping, 0, i_size - 1);
+ return (!dax_mapping(mapping) && mapping->nrpages) ||
+ (dax_mapping(mapping) && mapping->nrexceptional);
}
-EXPORT_SYMBOL(filemap_fdatawait);
int filemap_write_and_wait(struct address_space *mapping)
{
int err = 0;
- if ((!dax_mapping(mapping) && mapping->nrpages) ||
- (dax_mapping(mapping) && mapping->nrexceptional)) {
+ if (mapping_needs_writeback(mapping)) {
err = filemap_fdatawrite(mapping);
/*
* Even if the above returned error, the pages may be
@@ -566,8 +566,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
{
int err = 0;
- if ((!dax_mapping(mapping) && mapping->nrpages) ||
- (dax_mapping(mapping) && mapping->nrexceptional)) {
+ if (mapping_needs_writeback(mapping)) {
err = __filemap_fdatawrite_range(mapping, lstart, lend,
WB_SYNC_ALL);
/* See comment of filemap_write_and_wait() */
@@ -589,7 +588,7 @@ EXPORT_SYMBOL(filemap_write_and_wait_range);
void __filemap_set_wb_err(struct address_space *mapping, int err)
{
- errseq_t eseq = __errseq_set(&mapping->wb_err, err);
+ errseq_t eseq = errseq_set(&mapping->wb_err, err);
trace_filemap_set_wb_err(mapping, eseq);
}
@@ -656,8 +655,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
int err = 0, err2;
struct address_space *mapping = file->f_mapping;
- if ((!dax_mapping(mapping) && mapping->nrpages) ||
- (dax_mapping(mapping) && mapping->nrexceptional)) {
+ if (mapping_needs_writeback(mapping)) {
err = __filemap_fdatawrite_range(mapping, lstart, lend,
WB_SYNC_ALL);
/* See comment of filemap_write_and_wait() */