We already do this internally for vfs_readv() and vfs_writev(), which
need to check what method to use. Add generic helpers for this so that
drivers can do this themselves, if they haven't converted to using the
read/write iterator file_operations hooks just yet.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/read_write.c | 18 ++++++++++++++++++
include/linux/fs.h | 6 ++++++
2 files changed, 24 insertions(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index 82ec75937b08..1d035293607b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -802,6 +802,24 @@ static ssize_t do_loop_writev(struct file *file, struct iov_iter *iter,
return ret;
}
+/* generic read side helper for drivers converting to ->read_iter() */
+ssize_t vfs_read_iter(struct kiocb *iocb, struct iov_iter *to,
+ ssize_t (*read)(struct file *, char __user *,
+ size_t, loff_t *))
+{
+ return do_loop_readv(iocb->ki_filp, to, &iocb->ki_pos, 0, read);
+}
+EXPORT_SYMBOL(vfs_read_iter);
+
+/* generic write side helper for drivers converting to ->write_iter() */
+ssize_t vfs_write_iter(struct kiocb *iocb, struct iov_iter *from,
+ ssize_t (*write)(struct file *, const char __user *,
+ size_t, loff_t *))
+{
+ return do_loop_writev(iocb->ki_filp, from, &iocb->ki_pos, 0, write);
+}
+EXPORT_SYMBOL(vfs_write_iter);
+
ssize_t vfs_iocb_iter_read(struct file *file, struct kiocb *iocb,
struct iov_iter *iter)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8dfd53b52744..fd862985a309 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2119,6 +2119,12 @@ extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
loff_t, size_t, unsigned int);
+ssize_t vfs_write_iter(struct kiocb *iocb, struct iov_iter *from,
+ ssize_t (*write)(struct file *, const char __user *,
+ size_t, loff_t *));
+ssize_t vfs_read_iter(struct kiocb *iocb, struct iov_iter *to,
+ ssize_t (*read)(struct file *, char __user *,
+ size_t, loff_t *));
int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *len, unsigned int remap_flags,
--
2.43.0
On Thu, Apr 11, 2024 at 09:12:22AM -0600, Jens Axboe wrote:
> +/* generic read side helper for drivers converting to ->read_iter() */
> +ssize_t vfs_read_iter(struct kiocb *iocb, struct iov_iter *to,
> + ssize_t (*read)(struct file *, char __user *,
> + size_t, loff_t *))
> +{
> + return do_loop_readv(iocb->ki_filp, to, &iocb->ki_pos, 0, read);
> +}
> +EXPORT_SYMBOL(vfs_read_iter);
> +
> +/* generic write side helper for drivers converting to ->write_iter() */
> +ssize_t vfs_write_iter(struct kiocb *iocb, struct iov_iter *from,
> + ssize_t (*write)(struct file *, const char __user *,
> + size_t, loff_t *))
> +{
> + return do_loop_writev(iocb->ki_filp, from, &iocb->ki_pos, 0, write);
> +}
> +EXPORT_SYMBOL(vfs_write_iter);
Wait a minute; just what do you expect to happen if that ever gets called
for ITER_BVEC or ITER_XARRAY?
On 4/15/24 1:55 PM, Al Viro wrote:
> On Thu, Apr 11, 2024 at 09:12:22AM -0600, Jens Axboe wrote:
>
>> +/* generic read side helper for drivers converting to ->read_iter() */
>> +ssize_t vfs_read_iter(struct kiocb *iocb, struct iov_iter *to,
>> + ssize_t (*read)(struct file *, char __user *,
>> + size_t, loff_t *))
>> +{
>> + return do_loop_readv(iocb->ki_filp, to, &iocb->ki_pos, 0, read);
>> +}
>> +EXPORT_SYMBOL(vfs_read_iter);
>> +
>> +/* generic write side helper for drivers converting to ->write_iter() */
>> +ssize_t vfs_write_iter(struct kiocb *iocb, struct iov_iter *from,
>> + ssize_t (*write)(struct file *, const char __user *,
>> + size_t, loff_t *))
>> +{
>> + return do_loop_writev(iocb->ki_filp, from, &iocb->ki_pos, 0, write);
>> +}
>> +EXPORT_SYMBOL(vfs_write_iter);
>
> Wait a minute; just what do you expect to happen if that ever gets called
> for ITER_BVEC or ITER_XARRAY?
do_loop_{readv,writev} need to look like the one io_uring had, which was
just an augmented version of the fs/ version. At least for handling
anything that is IOVEC/UBUF/BVEC. Outside of that, should not be
callable for eg ITER_XARRAY, who would do that? We should probably add a
check at the top of each just to vet the iter type and -EINVAL if it's
not one of the supported ones. With a WARN_ON_ONCE(), I suspect.
I'll be posting the first patches separately again, I've made some local
tweaks, with some drivers that can support it as-is. Just haven't gotten
around to doing this weeks iteration on it yet.
--
Jens Axboe
On 4/15/24 3:08 PM, Al Viro wrote:
> On Mon, Apr 15, 2024 at 02:11:56PM -0600, Jens Axboe wrote:
>
>> do_loop_{readv,writev} need to look like the one io_uring had, which was
>> just an augmented version of the fs/ version. At least for handling
>> anything that is IOVEC/UBUF/BVEC.
>
> IOVEC and UBUF: pointer will be __user one; copy_from_user() works.
> KVEC: kernel pointer. Try copy_from_user() on that on x86 with SMAP (or
> on e.g. sparc64, etc.) and you'll get an oops.
> BVEC: page + offset, page quite possibly not mapped anywhere in kernel page
> tables. And "just kmap() around the call of your callback" is not a good
> idea either, for even more reason that for KVEC.
The old read/write path only handled user pointers, with the exception
being bvecs mapped on the io_uring side (which the io_uring version
dealt with) which also originally came from user pointers. So just user
memory. Why would we need to expand that now? Who would be doing
->read_iter() or ->write_iter() with something that isn't either UBUF or
IOVEC? Because that would break horrible as it is in the current kernel.
--
Jens Axboe
On Mon, Apr 15, 2024 at 03:16:12PM -0600, Jens Axboe wrote:
> The old read/write path only handled user pointers, with the exception
> being bvecs mapped on the io_uring side (which the io_uring version
> dealt with) which also originally came from user pointers. So just user
> memory. Why would we need to expand that now? Who would be doing
> ->read_iter() or ->write_iter() with something that isn't either UBUF or
> IOVEC? Because that would break horrible as it is in the current kernel.
No, it will not. And yes, it does happen. Look, for example, at
fs/coredump.c:dump_emit_page(). ->write_iter() (regular file or pipe one)
is called. On ITER_BVEC.
It happens, and not only there. Look at how /dev/loop works, for crying
out loud! You get a struct request; the backing pages might very well have
_never_ been mapped to any userland address (e.g. when it's something like
a directory block). And you hit this:
static int lo_write_simple(struct loop_device *lo, struct request *rq,
loff_t pos)
{
struct bio_vec bvec;
struct req_iterator iter;
int ret = 0;
rq_for_each_segment(bvec, rq, iter) {
ret = lo_write_bvec(lo->lo_backing_file, &bvec, &pos);
if (ret < 0)
break;
cond_resched();
}
return ret;
}
with lo_write_bvec() being
static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
{
struct iov_iter i;
ssize_t bw;
iov_iter_bvec(&i, ITER_SOURCE, bvec, 1, bvec->bv_len);
bw = vfs_iter_write(file, &i, ppos, 0);
if (likely(bw == bvec->bv_len))
return 0;
printk_ratelimited(KERN_ERR
"loop: Write error at byte offset %llu, length %i.\n",
(unsigned long long)*ppos, bvec->bv_len);
if (bw >= 0)
bw = -EIO;
return bw;
}
Neither ->read_iter() nor ->write_iter() can make an assumption that it
will be used with userland buffer. And yes, it works...
On Mon, Apr 15, 2024 at 02:11:56PM -0600, Jens Axboe wrote:
> do_loop_{readv,writev} need to look like the one io_uring had, which was
> just an augmented version of the fs/ version. At least for handling
> anything that is IOVEC/UBUF/BVEC.
IOVEC and UBUF: pointer will be __user one; copy_from_user() works.
KVEC: kernel pointer. Try copy_from_user() on that on x86 with SMAP (or
on e.g. sparc64, etc.) and you'll get an oops.
BVEC: page + offset, page quite possibly not mapped anywhere in kernel page
tables. And "just kmap() around the call of your callback" is not a good
idea either, for even more reason that for KVEC.
From: Al Viro
> Sent: 15 April 2024 20:55
>
> On Thu, Apr 11, 2024 at 09:12:22AM -0600, Jens Axboe wrote:
>
> > +/* generic read side helper for drivers converting to ->read_iter() */
> > +ssize_t vfs_read_iter(struct kiocb *iocb, struct iov_iter *to,
> > + ssize_t (*read)(struct file *, char __user *,
> > + size_t, loff_t *))
> > +{
> > + return do_loop_readv(iocb->ki_filp, to, &iocb->ki_pos, 0, read);
> > +}
> > +EXPORT_SYMBOL(vfs_read_iter);
> > +
> > +/* generic write side helper for drivers converting to ->write_iter() */
> > +ssize_t vfs_write_iter(struct kiocb *iocb, struct iov_iter *from,
> > + ssize_t (*write)(struct file *, const char __user *,
> > + size_t, loff_t *))
> > +{
> > + return do_loop_writev(iocb->ki_filp, from, &iocb->ki_pos, 0, write);
> > +}
> > +EXPORT_SYMBOL(vfs_write_iter);
>
> Wait a minute; just what do you expect to happen if that ever gets called
> for ITER_BVEC or ITER_XARRAY?
The extra indirect call is also going to be noticeable.
You need a code loop with a direct call.
That probably requires the loop to be a #define.
I was also thinking about drivers that only handle 'user' buffers and
where there really isn't a requirement to do anything else.
I've a driver that basically does:
if (!access_ok(....))
return -EFAULT;
for (off = 0; off < len; off += 8) {
if (__put_user(readq(io_addr + off), uaddr + off))
return -EFAULT;
}
Any non-trivial change requires a function that return the first/only
user buffer address/length and an error for a non-user address.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)