2022-05-02 23:15:10

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH 1/2] sysctl: read() must consume poll events, not poll()

+Lennart, since systemd is the only userspace I know of currently making
use of this.

On Mon, May 02, 2022 at 04:06:01PM +0200, Jason A. Donenfeld wrote:
> Events that poll() responds to are supposed to be consumed when the file
> is read(), not by the poll() itself. By putting it on the poll() itself,
> it makes it impossible to poll() on a epoll file descriptor, since the
> event gets consumed too early. Jann wrote a PoC, available in the link
> below.
>
> Reported-by: Jann Horn <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Luis Chamberlain <[email protected]>
> Cc: [email protected]
> Link: https://lore.kernel.org/lkml/[email protected]om/
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> ---
> fs/proc/proc_sysctl.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 7d9cfc730bd4..1aa145794207 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -622,6 +622,14 @@ static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter,
>
> static ssize_t proc_sys_read(struct kiocb *iocb, struct iov_iter *iter)
> {
> + struct inode *inode = file_inode(iocb->ki_filp);
> + struct ctl_table_header *head = grab_header(inode);
> + struct ctl_table *table = PROC_I(inode)->sysctl_entry;
> +
> + if (!IS_ERR(head) && table->poll)
> + iocb->ki_filp->private_data = proc_sys_poll_event(table->poll);
> + sysctl_head_finish(head);
> +
> return proc_sys_call_handler(iocb, iter, 0);
> }
>
> @@ -668,10 +676,8 @@ static __poll_t proc_sys_poll(struct file *filp, poll_table *wait)
> event = (unsigned long)filp->private_data;
> poll_wait(filp, &table->poll->wait, wait);
>
> - if (event != atomic_read(&table->poll->event)) {
> - filp->private_data = proc_sys_poll_event(table->poll);
> + if (event != atomic_read(&table->poll->event))
> ret = EPOLLIN | EPOLLRDNORM | EPOLLERR | EPOLLPRI;
> - }
>
> out:
> sysctl_head_finish(head);
> --
> 2.35.1

Just wanted to double check with you that this change wouldn't break how
you're using it in systemd for /proc/sys/kernel/hostname:

https://github.com/systemd/systemd/blob/39cd62c30c2e6bb5ec13ebc1ecf0d37ed015b1b8/src/journal/journald-server.c#L1832
https://github.com/systemd/systemd/blob/39cd62c30c2e6bb5ec13ebc1ecf0d37ed015b1b8/src/resolve/resolved-manager.c#L465

I couldn't find anybody else actually polling on it. Interestingly, it
looks like sd_event_add_io uses epoll() inside, but you're not hitting
the bug that Jann pointed out (because I suppose you're not poll()ing on
an epoll fd).

Jason