2023-01-11 15:16:55

by David Howells

[permalink] [raw]
Subject: [PATCH v5 0/9] iov_iter: Add extraction helpers


Hi Al,

Here are patches clean up some use of READ/WRITE and ITER_SOURCE/DEST,
patches to provide support for extracting pages from an iov_iter and a
patch to use the primary extraction function in the block layer bio code if
you could take a look?

[!] NOTE that I've switched the functions to be exported GPL-only at
Christoph's request[1]. They are, however, intended as a replacement
for iov_iter_get_pages*(), which is not marked _GPL - so that
functionality will probably become unavailable to non-GPL 3rd party
modules in future.

The first three patches deal with ITER_SOURCE/DEST:

(1) Switch ITER_SOURCE/DEST to an enum and add a couple of helper
functions to query if an iterator represents a source or a destination
buffer. Using an enum may allow extra consistency warnings from the
compiler.

(2) Use the ITER_SOURCE/DEST values in the iov_iter core functions rather
than READ/WRITE.

(3) Get rid of most of the callers of iov_iter_rw(), using the IOCB_WRITE
and IOMAP_WRITE instead where available. This leaves only two places
looking at the this value: a consistency check in cifs and a single
place in the block layer.

The next patch adds a replacement for iov_iter_get_pages*(), including
Logan's new version:

(4) Add a function to list-only, get or pin pages from an iterator as a
future replacement for iov_iter_get_pages*(). Pointers to the pages
are placed into an array (which will get allocated if not provided)
and, depending on the iterator type and direction, the pages will have
a ref or a pin get on them or be left untouched (on the assumption
that the caller manages their lifetime).

The determination is:

UBUF/IOVEC + DEST -> pin
UBUF/IOVEC + SOURCE -> get
PIPE + DEST -> list-only
BVEC/XARRAY -> list-only
Anything else -> EFAULT

The function also returns an indication of which of "list only, get or
pin" the extraction function did to aid in cleaning up (returning 0,
FOLL_GET or FOLL_PIN as appropriate).

Then there are a couple of patches that add stuff to netfslib that I want
to use there as well as in cifs:

(5) Add a netfslib function to use (4) to extract pages from an ITER_IOBUF
or ITER_UBUF iterator into an ITER_BVEC iterator. This will get or
pin the pages as appropriate.

(6) Add a netfslib function to extract pages from an iterator that's of
type ITER_UBUF/IOVEC/BVEC/KVEC/XARRAY and add them to a scatterlist.

The function in (4) is used for a UBUF and IOVEC iterators, so those
need cleaning up afterwards. BVEC and XARRAY iterators are ungot and
unpinned and may be rendered into elements that span multiple pages,
for example if large folios are present.

Finally, there are a set of three patches to make the block layer's BIO
code use iov_iter_extract_pages():

(7) Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED (opposite logic).

(8) Make the block layer's BIO code pin pages or leave pages unaltered
rather than getting a ref on the pages when the circumstances warrant,
and noting in the bio struct what cleanups should be performed so that
the bio cleanup code then does the right thing.

(9) Remove an unnecessary check against 0 in bio_flagged() (it returns
bool) thus allowing the gcc optimiser to combine multiple instances of
the bitwise-AND on the same flags value.

Changes:
========
ver #5)
- Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED and split into own patch.
- Transcribe FOLL_GET/PIN into BIO_PAGE_REFFED/PINNED flags.
- Add patch to allow bio_flagged() to be combined by gcc.

ver #4)
- Drop the patch to move the FOLL_* flags to linux/mm_types.h as they're
no longer referenced by linux/uio.h.
- Add ITER_SOURCE/DEST cleanup patches.
- Make iov_iter/netfslib iter extraction patches use ITER_SOURCE/DEST.
- Allow additional gup_flags to be passed into iov_iter_extract_pages().
- Add struct bio patch.

ver #3)
- Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access
to get/pin_user_pages_fast()[1].

ver #2)
- Rolled the extraction cleanup mode query function into the extraction
function, returning the indication through the argument list.
- Fixed patch 4 (extract to scatterlist) to actually use the new
extraction API.

I've pushed the patches here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-extract

David

Link: https://lore.kernel.org/r/Y3zFzdWnWlEJ8X8/@infradead.org/ [1]
Link: https://lore.kernel.org/r/166697254399.61150.1256557652599252121.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166722777223.2555743.162508599131141451.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732024173.3186319.18204305072070871546.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166869687556.3723671.10061142538708346995.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166920902005.1461876.2786264600108839814.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/166997419665.9475.15014699817597102032.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/167305160937.1521586.133299343565358971.stgit@warthog.procyon.org.uk/ # v4

---
David Howells (9):
iov_iter: Change the direction macros into an enum
iov_iter: Use the direction in the iterator functions
iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction
iov_iter: Add a function to extract a page list from an iterator
netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator
netfs: Add a function to extract an iterator into a scatterlist
bio: Rename BIO_NO_PAGE_REF to BIO_PAGE_REFFED and invert the meaning
iov_iter, block: Make bio structs pin pages rather than ref'ing if appropriate
bio: Fix bio_flagged() so that it can be combined


block/bio.c | 59 ++++--
block/fops.c | 8 +-
fs/9p/vfs_addr.c | 2 +-
fs/affs/file.c | 4 +-
fs/ceph/file.c | 2 +-
fs/dax.c | 6 +-
fs/direct-io.c | 22 +--
fs/exfat/inode.c | 6 +-
fs/ext2/inode.c | 2 +-
fs/f2fs/file.c | 10 +-
fs/fat/inode.c | 4 +-
fs/fuse/dax.c | 2 +-
fs/fuse/file.c | 8 +-
fs/hfs/inode.c | 2 +-
fs/hfsplus/inode.c | 2 +-
fs/iomap/direct-io.c | 6 +-
fs/jfs/inode.c | 2 +-
fs/netfs/Makefile | 1 +
fs/netfs/iterator.c | 367 ++++++++++++++++++++++++++++++++++
fs/nfs/direct.c | 2 +-
fs/nilfs2/inode.c | 2 +-
fs/ntfs3/inode.c | 2 +-
fs/ocfs2/aops.c | 2 +-
fs/orangefs/inode.c | 2 +-
fs/reiserfs/inode.c | 2 +-
fs/udf/inode.c | 2 +-
include/linux/bio.h | 5 +-
include/linux/blk_types.h | 3 +-
include/linux/netfs.h | 7 +
include/linux/uio.h | 59 ++++--
lib/iov_iter.c | 407 +++++++++++++++++++++++++++++++++++---
mm/vmalloc.c | 1 +
32 files changed, 903 insertions(+), 108 deletions(-)
create mode 100644 fs/netfs/iterator.c



2023-01-11 15:20:39

by David Howells

[permalink] [raw]
Subject: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

If a kiocb or iomap_iter is available, then use the IOCB_WRITE flag or the
IOMAP_WRITE flag to determine whether we're writing rather than the
iterator direction flag.

This allows all but three of the users of iov_iter_rw() to be got rid of: a
consistency check and a warning statement in cifs and one user in the block
layer that has neither available.

Signed-off-by: David Howells <[email protected]>
cc: Al Viro <[email protected]>
Link: https://lore.kernel.org/r/167305163159.1521586.9460968250704377087.stgit@warthog.procyon.org.uk/ # v4
---

block/fops.c | 8 ++++----
fs/9p/vfs_addr.c | 2 +-
fs/affs/file.c | 4 ++--
fs/ceph/file.c | 2 +-
fs/dax.c | 6 +++---
fs/direct-io.c | 22 +++++++++++-----------
fs/exfat/inode.c | 6 +++---
fs/ext2/inode.c | 2 +-
fs/f2fs/file.c | 10 +++++-----
fs/fat/inode.c | 4 ++--
fs/fuse/dax.c | 2 +-
fs/fuse/file.c | 8 ++++----
fs/hfs/inode.c | 2 +-
fs/hfsplus/inode.c | 2 +-
fs/iomap/direct-io.c | 6 +++---
fs/jfs/inode.c | 2 +-
fs/nfs/direct.c | 2 +-
fs/nilfs2/inode.c | 2 +-
fs/ntfs3/inode.c | 2 +-
fs/ocfs2/aops.c | 2 +-
fs/orangefs/inode.c | 2 +-
fs/reiserfs/inode.c | 2 +-
fs/udf/inode.c | 2 +-
23 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 50d245e8c913..29c6de67c39e 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,7 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
return -ENOMEM;
}

- if (iov_iter_rw(iter) == READ) {
+ if (!(iocb->ki_flags & IOCB_WRITE)) {
bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ);
if (user_backed_iter(iter))
should_dirty = true;
@@ -88,7 +88,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
goto out;
ret = bio.bi_iter.bi_size;

- if (iov_iter_rw(iter) == WRITE)
+ if (iocb->ki_flags & IOCB_WRITE)
task_io_account_write(ret);

if (iocb->ki_flags & IOCB_NOWAIT)
@@ -174,7 +174,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
struct blk_plug plug;
struct blkdev_dio *dio;
struct bio *bio;
- bool is_read = (iov_iter_rw(iter) == READ), is_sync;
+ bool is_read = !(iocb->ki_flags & IOCB_WRITE), is_sync;
blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
loff_t pos = iocb->ki_pos;
int ret = 0;
@@ -296,7 +296,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
unsigned int nr_pages)
{
struct block_device *bdev = iocb->ki_filp->private_data;
- bool is_read = iov_iter_rw(iter) == READ;
+ bool is_read = !(iocb->ki_flags & IOCB_WRITE);
blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
struct blkdev_dio *dio;
struct bio *bio;
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 97599edbc300..383d62fc3e18 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -254,7 +254,7 @@ v9fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
ssize_t n;
int err = 0;

- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb->ki_flags & IOCB_WRITE) {
n = p9_client_write(file->private_data, pos, iter, &err);
if (n) {
struct inode *inode = file_inode(file);
diff --git a/fs/affs/file.c b/fs/affs/file.c
index cefa222f7881..1c0e80a8aab9 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -400,7 +400,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
loff_t offset = iocb->ki_pos;
ssize_t ret;

- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb->ki_flags & IOCB_WRITE) {
loff_t size = offset + count;

if (AFFS_I(inode)->mmu_private < size)
@@ -408,7 +408,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
}

ret = blockdev_direct_IO(iocb, inode, iter, affs_get_block);
- if (ret < 0 && iov_iter_rw(iter) == WRITE)
+ if (ret < 0 && (iocb->ki_flags & IOCB_WRITE))
affs_write_failed(mapping, offset + count);
return ret;
}
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 764598e1efd9..8bdc5b52c271 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1284,7 +1284,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
struct timespec64 mtime = current_time(inode);
size_t count = iov_iter_count(iter);
loff_t pos = iocb->ki_pos;
- bool write = iov_iter_rw(iter) == WRITE;
+ bool write = iocb->ki_flags & IOCB_WRITE;
bool should_dirty = !write && user_backed_iter(iter);

if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP)
diff --git a/fs/dax.c b/fs/dax.c
index c48a3a93ab29..7f4c3789907b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1405,7 +1405,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
loff_t pos = iomi->pos;
struct dax_device *dax_dev = iomap->dax_dev;
loff_t end = pos + length, done = 0;
- bool write = iov_iter_rw(iter) == WRITE;
+ bool write = iomi->flags & IOMAP_WRITE;
bool cow = write && iomap->flags & IOMAP_F_SHARED;
ssize_t ret = 0;
size_t xfer;
@@ -1455,7 +1455,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,

map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size),
DAX_ACCESS, &kaddr, NULL);
- if (map_len == -EIO && iov_iter_rw(iter) == WRITE) {
+ if (map_len == -EIO && write) {
map_len = dax_direct_access(dax_dev, pgoff,
PHYS_PFN(size), DAX_RECOVERY_WRITE,
&kaddr, NULL);
@@ -1530,7 +1530,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!iomi.len)
return 0;

- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb->ki_flags & IOCB_WRITE) {
lockdep_assert_held_write(&iomi.inode->i_rwsem);
iomi.flags |= IOMAP_WRITE;
} else {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 03d381377ae1..e2d5c757a27a 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1143,7 +1143,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
*/

/* watch out for a 0 len io from a tricksy fs */
- if (iov_iter_rw(iter) == READ && !count)
+ if (!(iocb->ki_flags & IOCB_WRITE) && !count)
return 0;

dio = kmem_cache_alloc(dio_cache, GFP_KERNEL);
@@ -1157,14 +1157,14 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
memset(dio, 0, offsetof(struct dio, pages));

dio->flags = flags;
- if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
+ if (dio->flags & DIO_LOCKING && !(iocb->ki_flags & IOCB_WRITE)) {
/* will be released by direct_io_worker */
inode_lock(inode);
}

/* Once we sampled i_size check for reads beyond EOF */
dio->i_size = i_size_read(inode);
- if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
+ if (!(iocb->ki_flags & IOCB_WRITE) && offset >= dio->i_size) {
retval = 0;
goto fail_dio;
}
@@ -1177,7 +1177,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
goto fail_dio;
}

- if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
+ if (dio->flags & DIO_LOCKING && !(iocb->ki_flags & IOCB_WRITE)) {
struct address_space *mapping = iocb->ki_filp->f_mapping;

retval = filemap_write_and_wait_range(mapping, offset, end - 1);
@@ -1193,13 +1193,13 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
*/
if (is_sync_kiocb(iocb))
dio->is_async = false;
- else if (iov_iter_rw(iter) == WRITE && end > i_size_read(inode))
+ else if ((iocb->ki_flags & IOCB_WRITE) && end > i_size_read(inode))
dio->is_async = false;
else
dio->is_async = true;

dio->inode = inode;
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb->ki_flags & IOCB_WRITE) {
dio->opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
if (iocb->ki_flags & IOCB_NOWAIT)
dio->opf |= REQ_NOWAIT;
@@ -1211,7 +1211,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
* For AIO O_(D)SYNC writes we need to defer completions to a workqueue
* so that we can call ->fsync.
*/
- if (dio->is_async && iov_iter_rw(iter) == WRITE) {
+ if (dio->is_async && (iocb->ki_flags & IOCB_WRITE)) {
retval = 0;
if (iocb_is_dsync(iocb))
retval = dio_set_defer_completion(dio);
@@ -1248,7 +1248,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
spin_lock_init(&dio->bio_lock);
dio->refcount = 1;

- dio->should_dirty = user_backed_iter(iter) && iov_iter_rw(iter) == READ;
+ dio->should_dirty = user_backed_iter(iter) && !(iocb->ki_flags & IOCB_WRITE);
sdio.iter = iter;
sdio.final_block_in_request = end >> blkbits;

@@ -1305,7 +1305,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
* we can let i_mutex go now that its achieved its purpose
* of protecting us from looking up uninitialized blocks.
*/
- if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))
+ if (!(iocb->ki_flags & IOCB_WRITE) && (dio->flags & DIO_LOCKING))
inode_unlock(dio->inode);

/*
@@ -1317,7 +1317,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
*/
BUG_ON(retval == -EIOCBQUEUED);
if (dio->is_async && retval == 0 && dio->result &&
- (iov_iter_rw(iter) == READ || dio->result == count))
+ (!(iocb->ki_flags & IOCB_WRITE) || dio->result == count))
retval = -EIOCBQUEUED;
else
dio_await_completion(dio);
@@ -1330,7 +1330,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
return retval;

fail_dio:
- if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
+ if (dio->flags & DIO_LOCKING && !(iocb->ki_flags & IOCB_WRITE))
inode_unlock(inode);

kmem_cache_free(dio_cache, dio);
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 5b644cb057fa..26c2cff71878 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -412,10 +412,10 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
loff_t size = iocb->ki_pos + iov_iter_count(iter);
- int rw = iov_iter_rw(iter);
+ bool writing = iocb->ki_flags & IOCB_WRITE;
ssize_t ret;

- if (rw == WRITE) {
+ if (writing) {
/*
* FIXME: blockdev_direct_IO() doesn't use ->write_begin(),
* so we need to update the ->i_size_aligned to block boundary.
@@ -434,7 +434,7 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* condition of exfat_get_block() and ->truncate().
*/
ret = blockdev_direct_IO(iocb, inode, iter, exfat_get_block);
- if (ret < 0 && (rw & WRITE))
+ if (ret < 0 && writing)
exfat_write_failed(mapping, size);
return ret;
}
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 69aed9e2359e..9ed588d70722 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -919,7 +919,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
ssize_t ret;

ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block);
- if (ret < 0 && iov_iter_rw(iter) == WRITE)
+ if (ret < 0 && (iocb->ki_flags & IOCB_WRITE))
ext2_write_failed(mapping, offset + count);
return ret;
}
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ecbc8c135b49..7a7cfa39b327 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -809,7 +809,7 @@ int f2fs_truncate(struct inode *inode)
return 0;
}

-static bool f2fs_force_buffered_io(struct inode *inode, int rw)
+static bool f2fs_force_buffered_io(struct inode *inode, bool writing)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);

@@ -827,9 +827,9 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw)
* for blkzoned device, fallback direct IO to buffered IO, so
* all IOs can be serialized by log-structured write.
*/
- if (f2fs_sb_has_blkzoned(sbi) && (rw == WRITE))
+ if (f2fs_sb_has_blkzoned(sbi) && writing)
return true;
- if (f2fs_lfs_mode(sbi) && rw == WRITE && F2FS_IO_ALIGNED(sbi))
+ if (f2fs_lfs_mode(sbi) && writing && F2FS_IO_ALIGNED(sbi))
return true;
if (is_sbi_flag_set(sbi, SBI_CP_DISABLED))
return true;
@@ -865,7 +865,7 @@ int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
unsigned int bsize = i_blocksize(inode);

stat->result_mask |= STATX_DIOALIGN;
- if (!f2fs_force_buffered_io(inode, WRITE)) {
+ if (!f2fs_force_buffered_io(inode, true)) {
stat->dio_mem_align = bsize;
stat->dio_offset_align = bsize;
}
@@ -4254,7 +4254,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
if (!(iocb->ki_flags & IOCB_DIRECT))
return false;

- if (f2fs_force_buffered_io(inode, iov_iter_rw(iter)))
+ if (f2fs_force_buffered_io(inode, iocb->ki_flags & IOCB_WRITE))
return false;

/*
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index d99b8549ec8f..d7ffc30ce0e5 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -261,7 +261,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
loff_t offset = iocb->ki_pos;
ssize_t ret;

- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb->ki_flags & IOCB_WRITE) {
/*
* FIXME: blockdev_direct_IO() doesn't use ->write_begin(),
* so we need to update the ->mmu_private to block boundary.
@@ -281,7 +281,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* condition of fat_get_block() and ->truncate().
*/
ret = blockdev_direct_IO(iocb, inode, iter, fat_get_block);
- if (ret < 0 && iov_iter_rw(iter) == WRITE)
+ if (ret < 0 && (iocb->ki_flags & IOCB_WRITE))
fat_write_failed(mapping, offset + count);

return ret;
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index e23e802a8013..fddd0b8de27e 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -720,7 +720,7 @@ static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);

- return (iov_iter_rw(from) == WRITE &&
+ return ((iocb->ki_flags & IOCB_WRITE) &&
((iocb->ki_pos) >= i_size_read(inode) ||
(iocb->ki_pos + iov_iter_count(from) > i_size_read(inode))));
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 875314ee6f59..9575c4ca0667 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2897,7 +2897,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
inode = file->f_mapping->host;
i_size = i_size_read(inode);

- if ((iov_iter_rw(iter) == READ) && (offset >= i_size))
+ if (!(iocb->ki_flags & IOCB_WRITE) && (offset >= i_size))
return 0;

io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
@@ -2909,7 +2909,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
io->bytes = -1;
io->size = 0;
io->offset = offset;
- io->write = (iov_iter_rw(iter) == WRITE);
+ io->write = (iocb->ki_flags & IOCB_WRITE);
io->err = 0;
/*
* By default, we want to optimize all I/Os with async request
@@ -2942,7 +2942,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
io->done = &wait;
}

- if (iov_iter_rw(iter) == WRITE) {
+ if ((iocb->ki_flags & IOCB_WRITE)) {
ret = fuse_direct_io(io, iter, &pos, FUSE_DIO_WRITE);
fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
} else {
@@ -2965,7 +2965,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)

kref_put(&io->refcnt, fuse_io_release);

- if (iov_iter_rw(iter) == WRITE) {
+ if ((iocb->ki_flags & IOCB_WRITE)) {
fuse_write_update_attr(inode, pos, ret);
/* For extending writes we already hold exclusive lock */
if (ret < 0 && offset + count > i_size)
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 9c329a365e75..638c87afd96f 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -141,7 +141,7 @@ static ssize_t hfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely((iocb->ki_flags & IOCB_WRITE) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;

diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index 840577a0c1e7..843e6f1ced25 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -138,7 +138,7 @@ static ssize_t hfsplus_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely((iocb->ki_flags & IOCB_WRITE) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 9804714b1751..d045b0c54d04 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -519,7 +519,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->submit.waiter = current;
dio->submit.poll_bio = NULL;

- if (iov_iter_rw(iter) == READ) {
+ if (!(iocb->ki_flags & IOCB_WRITE)) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;

@@ -573,7 +573,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret)
goto out_free_dio;

- if (iov_iter_rw(iter) == WRITE) {
+ if (iomi.flags & IOMAP_WRITE) {
/*
* Try to invalidate cache pages for the range we are writing.
* If this invalidation fails, let the caller fall back to
@@ -613,7 +613,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
* Revert iter to a state corresponding to that as some callers (such
* as the splice code) rely on it.
*/
- if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size)
+ if (!(iomi.flags & IOMAP_WRITE) && iomi.pos >= dio->i_size)
iov_iter_revert(iter, iomi.pos - dio->i_size);

if (ret == -EFAULT && dio->size && (dio_flags & IOMAP_DIO_PARTIAL)) {
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 8ac10e396050..f403d2f2bfe6 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -334,7 +334,7 @@ static ssize_t jfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely((iocb->ki_flags & IOCB_WRITE) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1707f46b1335..e94ce42f93a8 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -133,7 +133,7 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)

VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);

- if (iov_iter_rw(iter) == READ)
+ if (!(iocb->ki_flags & IOCB_WRITE))
ret = nfs_file_direct_read(iocb, iter, true);
else
ret = nfs_file_direct_write(iocb, iter, true);
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index 232dd7b6cca1..59df5707d30f 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -289,7 +289,7 @@ nilfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct inode *inode = file_inode(iocb->ki_filp);

- if (iov_iter_rw(iter) == WRITE)
+ if (iocb->ki_flags & IOCB_WRITE)
return 0;

/* Needs synchronization with the cleaner */
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 20b953871574..f0881d0522f8 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -761,7 +761,7 @@ static ssize_t ntfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
struct ntfs_inode *ni = ntfs_i(inode);
loff_t vbo = iocb->ki_pos;
loff_t end;
- int wr = iov_iter_rw(iter) & WRITE;
+ bool wr = iocb->ki_flags & IOCB_WRITE;
size_t iter_count = iov_iter_count(iter);
loff_t valid;
ssize_t ret;
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 1d65f6ef00ca..3f41c6b403c2 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2441,7 +2441,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
!ocfs2_supports_append_dio(osb))
return 0;

- if (iov_iter_rw(iter) == READ)
+ if (!(iocb->ki_flags & IOCB_WRITE))
get_block = ocfs2_lock_get_block;
else
get_block = ocfs2_dio_wr_get_block;
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 4df560894386..fbecca379e91 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -521,7 +521,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
*/
struct file *file = iocb->ki_filp;
loff_t pos = iocb->ki_pos;
- enum ORANGEFS_io_type type = iov_iter_rw(iter) == WRITE ?
+ enum ORANGEFS_io_type type = (iocb->ki_flags & IOCB_WRITE) ?
ORANGEFS_IO_WRITE : ORANGEFS_IO_READ;
loff_t *offset = &pos;
struct inode *inode = file->f_mapping->host;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index c7d1fa526dea..1fc94fd5c371 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -3249,7 +3249,7 @@ static ssize_t reiserfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely((iocb->ki_flags & IOCB_WRITE) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;

diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 1d7c2a812fc1..6d2ce0e512f4 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -219,7 +219,7 @@ static ssize_t udf_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
ssize_t ret;

ret = blockdev_direct_IO(iocb, inode, iter, udf_get_block);
- if (unlikely(ret < 0 && iov_iter_rw(iter) == WRITE))
+ if (unlikely(ret < 0 && (iocb->ki_flags & IOCB_WRITE)))
udf_write_failed(mapping, iocb->ki_pos + count);
return ret;
}


2023-01-12 08:24:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Wed, Jan 11, 2023 at 02:27:58PM +0000, David Howells wrote:
> This allows all but three of the users of iov_iter_rw() to be got rid of: a
> consistency check and a warning statement in cifs

Let's just drop these two.

> and one user in the block
> layer that has neither available.

And use the information in the request for this one (see patch below),
and then move this patch first in the series, add an explicit direction
parameter in the gup_flags to the get/pin helper and drop iov_iter_rw
and the whole confusing source/dest information in the iov_iter entirely,
which is a really nice big tree wide cleanup that remove redundant
information.

---
diff --git a/block/blk-map.c b/block/blk-map.c
index 19940c978c73bb..08cbb7ff3b196d 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -203,7 +203,7 @@ static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data,
/*
* success
*/
- if ((iov_iter_rw(iter) == WRITE &&
+ if ((op_is_write(rq->cmd_flags) &&
(!map_data || !map_data->null_mapped)) ||
(map_data && map_data->from_user)) {
ret = bio_copy_from_iter(bio, iter);

2023-01-12 11:35:54

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

Christoph Hellwig <[email protected]> wrote:

> > This allows all but three of the users of iov_iter_rw() to be got rid of: a
> > consistency check and a warning statement in cifs
>
> Let's just drop these two.
>
> > and one user in the block
> > layer that has neither available.
>
> And use the information in the request for this one (see patch below),
> and then move this patch first in the series, add an explicit direction
> parameter in the gup_flags to the get/pin helper and drop iov_iter_rw
> and the whole confusing source/dest information in the iov_iter entirely,
> which is a really nice big tree wide cleanup that remove redundant
> information.

Fine by me, but Al might object as I think he wanted the internal checks. Al?

David

2023-01-12 15:06:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Thu, Jan 12, 2023 at 10:31:01AM +0000, David Howells wrote:
> > And use the information in the request for this one (see patch below),
> > and then move this patch first in the series, add an explicit direction
> > parameter in the gup_flags to the get/pin helper and drop iov_iter_rw
> > and the whole confusing source/dest information in the iov_iter entirely,
> > which is a really nice big tree wide cleanup that remove redundant
> > information.
>
> Fine by me, but Al might object as I think he wanted the internal checks. Al?

I'm happy to have another discussion, but the fact the information in
the iov_iter is 98% redundant and various callers got it wrong and
away is a pretty good sign that we should drop this information. It
also nicely simplified the API.

2023-01-12 18:30:19

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Thu, Jan 12, 2023 at 06:08:14AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 12, 2023 at 10:31:01AM +0000, David Howells wrote:
> > > And use the information in the request for this one (see patch below),
> > > and then move this patch first in the series, add an explicit direction
> > > parameter in the gup_flags to the get/pin helper and drop iov_iter_rw
> > > and the whole confusing source/dest information in the iov_iter entirely,
> > > which is a really nice big tree wide cleanup that remove redundant
> > > information.
> >
> > Fine by me, but Al might object as I think he wanted the internal checks. Al?
>
> I'm happy to have another discussion, but the fact the information in
> the iov_iter is 98% redundant and various callers got it wrong and
> away is a pretty good sign that we should drop this information. It
> also nicely simplified the API.

I have no problem with getting rid of iov_iter_rw(), but I would really like to
keep ->data_source. If nothing else, any place getting direction wrong is
a trouble waiting to happen - something that is currently dealing only with
iovec and bvec might be given e.g. a pipe.

Speaking of which, I would really like to get rid of the kludge /dev/sg is
pulling - right now from-device requests there do the following:
* copy the entire destination in (and better hope that nothing is mapped
write-only, etc.)
* form a request + bio, attach the pages with the destination copy to it
* submit
* copy the damn thing back to destination after the completion.
The reason for that is (quoted in commit ecb554a846f8)

====
The semantics of SG_DXFER_TO_FROM_DEV were:
- copy user space buffer to kernel (LLD) buffer
- do SCSI command which is assumed to be of the DATA_IN
(data from device) variety. This would overwrite
some or all of the kernel buffer
- copy kernel (LLD) buffer back to the user space.

The idea was to detect short reads by filling the original
user space buffer with some marker bytes ("0xec" it would
seem in this report). The "resid" value is a better way
of detecting short reads but that was only added this century
and requires co-operation from the LLD.
====

IOW, we can't tell how much do we actually want to copy out, unless the SCSI driver
in question is recent enough. Note that the above had been written in 2009, so
it might not be an issue these days.

Do we still have SCSI drivers that would not set the residual on bypass requests
completion? Because I would obviously very much prefer to get rid of that
copy in-overwrite-copy out thing there - given the accurate information about
the transfer length it would be easy to do.

2023-01-12 22:45:37

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On 1/12/23 09:37, Al Viro wrote:
> On Thu, Jan 12, 2023 at 06:08:14AM -0800, Christoph Hellwig wrote:
>> On Thu, Jan 12, 2023 at 10:31:01AM +0000, David Howells wrote:
>>>> And use the information in the request for this one (see patch below),
>>>> and then move this patch first in the series, add an explicit direction
>>>> parameter in the gup_flags to the get/pin helper and drop iov_iter_rw
>>>> and the whole confusing source/dest information in the iov_iter entirely,
>>>> which is a really nice big tree wide cleanup that remove redundant
>>>> information.
>>>
>>> Fine by me, but Al might object as I think he wanted the internal checks. Al?
>>
>> I'm happy to have another discussion, but the fact the information in
>> the iov_iter is 98% redundant and various callers got it wrong and
>> away is a pretty good sign that we should drop this information. It
>> also nicely simplified the API.
>
> I have no problem with getting rid of iov_iter_rw(), but I would really like to
> keep ->data_source. If nothing else, any place getting direction wrong is
> a trouble waiting to happen - something that is currently dealing only with
> iovec and bvec might be given e.g. a pipe.
>
> Speaking of which, I would really like to get rid of the kludge /dev/sg is
> pulling - right now from-device requests there do the following:
> * copy the entire destination in (and better hope that nothing is mapped
> write-only, etc.)
> * form a request + bio, attach the pages with the destination copy to it
> * submit
> * copy the damn thing back to destination after the completion.
> The reason for that is (quoted in commit ecb554a846f8)
>
> ====
> The semantics of SG_DXFER_TO_FROM_DEV were:
> - copy user space buffer to kernel (LLD) buffer
> - do SCSI command which is assumed to be of the DATA_IN
> (data from device) variety. This would overwrite
> some or all of the kernel buffer
> - copy kernel (LLD) buffer back to the user space.
>
> The idea was to detect short reads by filling the original
> user space buffer with some marker bytes ("0xec" it would
> seem in this report). The "resid" value is a better way
> of detecting short reads but that was only added this century
> and requires co-operation from the LLD.
> ====
>
> IOW, we can't tell how much do we actually want to copy out, unless the SCSI driver
> in question is recent enough. Note that the above had been written in 2009, so
> it might not be an issue these days.
>
> Do we still have SCSI drivers that would not set the residual on bypass requests
> completion? Because I would obviously very much prefer to get rid of that
> copy in-overwrite-copy out thing there - given the accurate information about
> the transfer length it would be easy to do.

(+Martin and Doug)

I'm not sure that we still need the double copy in the sg driver. It
seems obscure to me that there is user space software that relies on
finding "0xec" in bytes not originating from a SCSI device.
Additionally, SCSI drivers that do not support residuals should be
something from the past.

Others may be better qualified to comment on this topic.

Bart.

2023-01-12 23:03:56

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Wed, Jan 11, 2023 at 02:27:58PM +0000, David Howells wrote:
> If a kiocb or iomap_iter is available, then use the IOCB_WRITE flag or the
> IOMAP_WRITE flag to determine whether we're writing rather than the
> iterator direction flag.
>
> This allows all but three of the users of iov_iter_rw() to be got rid of: a
> consistency check and a warning statement in cifs and one user in the block
> layer that has neither available.
>
> Signed-off-by: David Howells <[email protected]>
> cc: Al Viro <[email protected]>
> Link: https://lore.kernel.org/r/167305163159.1521586.9460968250704377087.stgit@warthog.procyon.org.uk/ # v4

Incidentally, I'd suggest iocb_is_write(iocb) - just look at the amount of
places where you end up with clumsy parentheses...

2023-01-13 05:59:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Thu, Jan 12, 2023 at 01:49:14PM -0800, Bart Van Assche wrote:
> I'm not sure that we still need the double copy in the sg driver. It seems
> obscure to me that there is user space software that relies on finding
> "0xec" in bytes not originating from a SCSI device. Additionally, SCSI
> drivers that do not support residuals should be something from the past.
>
> Others may be better qualified to comment on this topic.

Yeah. And that weird (ab)use of blk_rq_map_user_iov into preallocated
pages in sg has been a constant source of pain. I'd be very happy to
make it match the generic SG_IO implementation and either do
get_user_pages or the normal bounce buffering implemented in the
common code.

2023-01-13 06:01:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Thu, Jan 12, 2023 at 09:56:25PM +0000, Al Viro wrote:
> Incidentally, I'd suggest iocb_is_write(iocb) - just look at the amount of
> places where you end up with clumsy parentheses...

Agreed, a little helper won't hurt.

2023-01-13 06:07:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Thu, Jan 12, 2023 at 05:37:26PM +0000, Al Viro wrote:
> I have no problem with getting rid of iov_iter_rw(), but I would really like to
> keep ->data_source. If nothing else, any place getting direction wrong is
> a trouble waiting to happen - something that is currently dealing only with
> iovec and bvec might be given e.g. a pipe.

But the calling code knows the direction, in fact it is generally
encoded in the actual operation we do on the iov_iter. The only
exception is iov_iter_get_pages and friends. So I'd much rather pass
make the lass operation that does not explicitly encode a direction
explicit rather than carrying this duplicate information.

The direction of the iov_iter has been a major source of confusing,
and getting rid of it removes that confusion.

2023-01-14 01:59:33

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction


Christoph,

> Yeah. And that weird (ab)use of blk_rq_map_user_iov into preallocated
> pages in sg has been a constant source of pain. I'd be very happy to
> make it match the generic SG_IO implementation and either do
> get_user_pages or the normal bounce buffering implemented in the
> common code.

Yes, it would be nice to get that cleaned up.

--
Martin K. Petersen Oracle Linux Engineering

2023-01-14 02:10:01

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction


Bart,

> I'm not sure that we still need the double copy in the sg driver. It
> seems obscure to me that there is user space software that relies on
> finding "0xec" in bytes not originating from a SCSI
> device. Additionally, SCSI drivers that do not support residuals
> should be something from the past.

Yeah. I'm not aware of anything that relies on this still. But obviously
Doug has more experience in the app dependency department.

--
Martin K. Petersen Oracle Linux Engineering

2023-01-14 02:15:56

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] iov_iter: Use IOCB/IOMAP_WRITE if available rather than iterator direction

On Fri, Jan 13, 2023 at 08:34:50PM -0500, Martin K. Petersen wrote:
>
> Bart,
>
> > I'm not sure that we still need the double copy in the sg driver. It
> > seems obscure to me that there is user space software that relies on
> > finding "0xec" in bytes not originating from a SCSI
> > device. Additionally, SCSI drivers that do not support residuals
> > should be something from the past.
>
> Yeah. I'm not aware of anything that relies on this still. But obviously
> Doug has more experience in the app dependency department.

Are we guaranteed to know the accurate amount of data that got transferred
for all surviving drivers? If we do, we can do accurate copy-out and all
apps will keep seeing what they currently do. If we don't, the best we
can do is replacing copy-in + IO + copy-out with memset + IO + copy-out.