2024-04-03 14:06:16

by Jens Axboe

[permalink] [raw]
Subject: [PATCH 1/3] timerfd: convert to ->read_iter()

Switch timerfd to using fops->read_iter(), so it can support not just
O_NONBLOCK but IOCB_NOWAIT as well. With the latter, users like io_uring
interact with timerfds a lot better, as they can be driven purely
by the poll trigger.

Manually get and install the required fd, so that FMODE_NOWAIT can be
set before the file is installed into the file table.

No functional changes intended in this patch, it's purely a straight
conversion to using the read iterator method.

Signed-off-by: Jens Axboe <[email protected]>
---
fs/timerfd.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index e9c96a0c79f1..b96690b46c1f 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -262,17 +262,18 @@ static __poll_t timerfd_poll(struct file *file, poll_table *wait)
return events;
}

-static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
- loff_t *ppos)
+static ssize_t timerfd_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
+ struct file *file = iocb->ki_filp;
struct timerfd_ctx *ctx = file->private_data;
ssize_t res;
u64 ticks = 0;

- if (count < sizeof(ticks))
+ if (iov_iter_count(to) < sizeof(ticks))
return -EINVAL;
+
spin_lock_irq(&ctx->wqh.lock);
- if (file->f_flags & O_NONBLOCK)
+ if (file->f_flags & O_NONBLOCK || iocb->ki_flags & IOCB_NOWAIT)
res = -EAGAIN;
else
res = wait_event_interruptible_locked_irq(ctx->wqh, ctx->ticks);
@@ -313,7 +314,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
}
spin_unlock_irq(&ctx->wqh.lock);
if (ticks)
- res = put_user(ticks, (u64 __user *) buf) ? -EFAULT: sizeof(ticks);
+ res = copy_to_iter(&ticks, sizeof(ticks), to);
return res;
}

@@ -384,7 +385,7 @@ static long timerfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg
static const struct file_operations timerfd_fops = {
.release = timerfd_release,
.poll = timerfd_poll,
- .read = timerfd_read,
+ .read_iter = timerfd_read_iter,
.llseek = noop_llseek,
.show_fdinfo = timerfd_show,
.unlocked_ioctl = timerfd_ioctl,
@@ -407,6 +408,7 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
{
int ufd;
struct timerfd_ctx *ctx;
+ struct file *file;

/* Check the TFD_* constants for consistency. */
BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
@@ -443,11 +445,22 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)

ctx->moffs = ktime_mono_to_real(0);

- ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
- O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
- if (ufd < 0)
+ ufd = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
+ if (ufd < 0) {
kfree(ctx);
+ return ufd;
+ }
+
+ file = anon_inode_getfile("[timerfd]", &timerfd_fops, ctx,
+ O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
+ if (IS_ERR(file)) {
+ put_unused_fd(ufd);
+ kfree(ctx);
+ return PTR_ERR(file);
+ }

+ file->f_mode |= FMODE_NOWAIT;
+ fd_install(ufd, file);
return ufd;
}

--
2.43.0



2024-04-04 03:27:20

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 1/3] timerfd: convert to ->read_iter()

On Wed, Apr 03, 2024 at 08:02:52AM -0600, Jens Axboe wrote:

> - res = put_user(ticks, (u64 __user *) buf) ? -EFAULT: sizeof(ticks);
> + res = copy_to_iter(&ticks, sizeof(ticks), to);

Umm... That's not an equivalent transformation - different behaviour on
short copy; try to call it via read(fd, unmapped_buffer, 8) and see what
happens.

copy_to_iter() returns the amount copied; no data copied => return 0, not -EFAULT.

> + ufd = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));

You do realize that get_unused_fd_flags() ignores O_RDWR (or O_NDELAY), right?
Mixing those with O_CLOEXEC makes sense for anon_inode_getfd(), but here you
have separate calls of get_unused_fd_flags() and anon_inode_getfile(), so...

2024-04-04 12:53:48

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/3] timerfd: convert to ->read_iter()

On 4/3/24 4:40 PM, Al Viro wrote:
> On Wed, Apr 03, 2024 at 08:02:52AM -0600, Jens Axboe wrote:
>
>> - res = put_user(ticks, (u64 __user *) buf) ? -EFAULT: sizeof(ticks);
>> + res = copy_to_iter(&ticks, sizeof(ticks), to);
>
> Umm... That's not an equivalent transformation - different behaviour on
> short copy; try to call it via read(fd, unmapped_buffer, 8) and see what
> happens.
>
> copy_to_iter() returns the amount copied; no data copied => return 0, not -EFAULT.

Gah yes, ironically I did a bunch of conversions yesterday and it's all
fine. Not sure wha thappened here. I'll fix it up.

>> + ufd = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
>
> You do realize that get_unused_fd_flags() ignores O_RDWR (or
> O_NDELAY), right? Mixing those with O_CLOEXEC makes sense for
> anon_inode_getfd(), but here you have separate calls of
> get_unused_fd_flags() and anon_inode_getfile(), so...

I do, but figured it was cleaner that way. But I can change the flag
passing, ditto for the other ones.

--
Jens Axboe