2023-07-09 07:35:49

by Wen Yang

[permalink] [raw]
Subject: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0

From: Wen Yang <[email protected]>

For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.

Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
Signed-off-by: Wen Yang <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dylan Yudaken <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
fs/eventfd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8aa36cd37351..10a101df19cd 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
{
lockdep_assert_held(&ctx->wqh.lock);

- *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
+ *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count;
ctx->count -= *cnt;
}
EXPORT_SYMBOL_GPL(eventfd_ctx_do_read);
@@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
return -EFAULT;
if (ucnt == ULLONG_MAX)
return -EINVAL;
+ if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
+ return -EINVAL;
spin_lock_irq(&ctx->wqh.lock);
res = -EAGAIN;
if (ULLONG_MAX - ctx->count > ucnt)
--
2.25.1



2023-07-09 20:09:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0

On Sun, Jul 09, 2023 at 09:13:28PM +0200, Markus Elfring wrote:
> > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
> > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
>
> Please choose an imperative change suggestion.

Markus, stop this nitpicking. It puts off contributors. The changelog
is perfectly clear.

2023-07-10 14:23:42

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0

On Sun, Jul 09, 2023 at 02:54:51PM +0800, [email protected] wrote:
> From: Wen Yang <[email protected]>
>
> For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
> eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
>
> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
> Signed-off-by: Wen Yang <[email protected]>
> Cc: Alexander Viro <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Christian Brauner <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Dylan Yudaken <[email protected]>
> Cc: David Woodhouse <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---

So this looks ok but I would like to see an analysis how the overflow
can happen. I'm looking at the callers and it seems that once ctx->count
hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is
there a caller that can call directly or indirectly
eventfd_ctx_do_read() on a ctx->count == 0?

I'm just slightly skeptical about patches that fix issues without an
analysis how this can happen.

> fs/eventfd.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 8aa36cd37351..10a101df19cd 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
> {
> lockdep_assert_held(&ctx->wqh.lock);
>
> - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
> + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count;
> ctx->count -= *cnt;
> }
> EXPORT_SYMBOL_GPL(eventfd_ctx_do_read);
> @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
> return -EFAULT;
> if (ucnt == ULLONG_MAX)
> return -EINVAL;
> + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
> + return -EINVAL;
> spin_lock_irq(&ctx->wqh.lock);
> res = -EAGAIN;
> if (ULLONG_MAX - ctx->count > ucnt)
> --
> 2.25.1
>

2023-07-10 15:50:39

by Wen Yang

[permalink] [raw]
Subject: Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0


On 2023/7/10 22:12, Christian Brauner wrote:
> On Sun, Jul 09, 2023 at 02:54:51PM +0800, [email protected] wrote:
>> From: Wen Yang <[email protected]>
>>
>> For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
>> eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
>>
>> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
>> Signed-off-by: Wen Yang <[email protected]>
>> Cc: Alexander Viro <[email protected]>
>> Cc: Jens Axboe <[email protected]>
>> Cc: Christian Brauner <[email protected]>
>> Cc: Christoph Hellwig <[email protected]>
>> Cc: Dylan Yudaken <[email protected]>
>> Cc: David Woodhouse <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
> So this looks ok but I would like to see an analysis how the overflow
> can happen. I'm looking at the callers and it seems that once ctx->count
> hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is
> there a caller that can call directly or indirectly
> eventfd_ctx_do_read() on a ctx->count == 0?
eventfd_read() ensures that ctx->count is not 0 before calling
eventfd_ctx_do_read() and it is correct.

But it is not appropriate for eventfd_ctx_remove_wait_queue() to call
eventfd_ctx_do_read() unconditionally,

as it may not only causes ctx->count to overflow, but also unnecessarily
calls wake_up_locked_poll().


I am sorry for just adding the following string in the patch:
Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")


Looking forward to your suggestions.

--

Best wishes,

Wen


> I'm just slightly skeptical about patches that fix issues without an
> analysis how this can happen.
>
>> fs/eventfd.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>> index 8aa36cd37351..10a101df19cd 100644
>> --- a/fs/eventfd.c
>> +++ b/fs/eventfd.c
>> @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
>> {
>> lockdep_assert_held(&ctx->wqh.lock);
>>
>> - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
>> + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count;
>> ctx->count -= *cnt;
>> }
>> EXPORT_SYMBOL_GPL(eventfd_ctx_do_read);
>> @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
>> return -EFAULT;
>> if (ucnt == ULLONG_MAX)
>> return -EINVAL;
>> + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
>> + return -EINVAL;
>> spin_lock_irq(&ctx->wqh.lock);
>> res = -EAGAIN;
>> if (ULLONG_MAX - ctx->count > ucnt)
>> --
>> 2.25.1
>>


2023-07-11 09:19:14

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0

On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote:
>
> On 2023/7/10 22:12, Christian Brauner wrote:
> > On Sun, Jul 09, 2023 at 02:54:51PM +0800, [email protected] wrote:
> > > From: Wen Yang <[email protected]>
> > >
> > > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
> > > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
> > >
> > > Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
> > > Signed-off-by: Wen Yang <[email protected]>
> > > Cc: Alexander Viro <[email protected]>
> > > Cc: Jens Axboe <[email protected]>
> > > Cc: Christian Brauner <[email protected]>
> > > Cc: Christoph Hellwig <[email protected]>
> > > Cc: Dylan Yudaken <[email protected]>
> > > Cc: David Woodhouse <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > ---
> > So this looks ok but I would like to see an analysis how the overflow
> > can happen. I'm looking at the callers and it seems that once ctx->count
> > hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is
> > there a caller that can call directly or indirectly
> > eventfd_ctx_do_read() on a ctx->count == 0?
> eventfd_read() ensures that ctx->count is not 0 before calling
> eventfd_ctx_do_read() and it is correct.
>
> But it is not appropriate for eventfd_ctx_remove_wait_queue() to call
> eventfd_ctx_do_read() unconditionally,
>
> as it may not only causes ctx->count to overflow, but also unnecessarily
> calls wake_up_locked_poll().

Hm, so I think you're right and an underflow can be triggered for at
least three subsystems:

(1) virt/kvm/eventfd.c
(2) drivers/vfio/virqfd.c
(3) drivers/virt/acrn/irqfd.c

where (2) and (3) are just modeled after (1). The eventfd must've been
set to EFD_SEMAPHORE and ctx->count must been or decremented zero. The
only way I can see the _underflow_ happening is if the irqfd is shutdown
through an ioctl() like KVM_IRQFD with KVM_IRQFD_FLAG_DEASSIGN raised
while ctx->count is zero:

kvm_vm_ioctl()
-> kvm_irqfd()
-> kvm_irqfd_deassign()
-> irqfd_deactivate()
-> irqfd_shutdown()
-> eventfd_ctx_remove_wait_queue(&cnt)

which would underflow @cnt and cause a spurious wakeup. Userspace would
still read one because of EFD_SEMAPHORE semantics and wouldn't notice
the underflow.

I think it's probably not that bad because afaict, this really can only
happen when (1)-(3) are shutdown. But we should still fix it ofc.

>
>
> I am sorry for just adding the following string in the patch:
> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
>
>
> Looking forward to your suggestions.

What I usually look for is some callchain/analysis that explain under
what circumstance what this is fixing can happen. That makes life for
reviewers a lot easier because they don't have to dig out that work
themselves which takes time.

2023-07-11 10:01:10

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0

On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote:
>
> On 2023/7/10 22:12, Christian Brauner wrote:
> > On Sun, Jul 09, 2023 at 02:54:51PM +0800, [email protected] wrote:
> > > From: Wen Yang <[email protected]>
> > >
> > > For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
> > > eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
> > >
> > > Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
> > > Signed-off-by: Wen Yang <[email protected]>
> > > Cc: Alexander Viro <[email protected]>
> > > Cc: Jens Axboe <[email protected]>
> > > Cc: Christian Brauner <[email protected]>
> > > Cc: Christoph Hellwig <[email protected]>
> > > Cc: Dylan Yudaken <[email protected]>
> > > Cc: David Woodhouse <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > ---
> > So this looks ok but I would like to see an analysis how the overflow
> > can happen. I'm looking at the callers and it seems that once ctx->count
> > hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is
> > there a caller that can call directly or indirectly
> > eventfd_ctx_do_read() on a ctx->count == 0?
> eventfd_read() ensures that ctx->count is not 0 before calling
> eventfd_ctx_do_read() and it is correct.
>
> But it is not appropriate for eventfd_ctx_remove_wait_queue() to call
> eventfd_ctx_do_read() unconditionally,
>
> as it may not only causes ctx->count to overflow, but also unnecessarily
> calls wake_up_locked_poll().
>
>
> I am sorry for just adding the following string in the patch:
> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
>
>
> Looking forward to your suggestions.
>
> --
>
> Best wishes,
>
> Wen
>
>
> > I'm just slightly skeptical about patches that fix issues without an
> > analysis how this can happen.
> >
> > > fs/eventfd.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/eventfd.c b/fs/eventfd.c
> > > index 8aa36cd37351..10a101df19cd 100644
> > > --- a/fs/eventfd.c
> > > +++ b/fs/eventfd.c
> > > @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
> > > {
> > > lockdep_assert_held(&ctx->wqh.lock);
> > > - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
> > > + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count;
> > > ctx->count -= *cnt;
> > > }
> > > EXPORT_SYMBOL_GPL(eventfd_ctx_do_read);
> > > @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
> > > return -EFAULT;
> > > if (ucnt == ULLONG_MAX)
> > > return -EINVAL;
> > > + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
> > > + return -EINVAL;

Hm, why is bit necessary though? What's wrong with specifying ucnt == 0
with EFD_SEMAPHORE? This also looks like a (very low potential) uapi
break.

2023-07-11 10:04:22

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0

On Sun, 09 Jul 2023 14:54:51 +0800, [email protected] wrote:
> For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
> eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
>
>

I've tweaked the commit message to explain how an underflow can happen.
And for now I dropped that bit:

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 10a101df19cd..33a918f9566c 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -269,8 +269,6 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
return -EFAULT;
if (ucnt == ULLONG_MAX)
return -EINVAL;
- if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
- return -EINVAL;
spin_lock_irq(&ctx->wqh.lock);
res = -EAGAIN;
if (ULLONG_MAX - ctx->count > ucnt)

because I don't yet understand why that should be forbidden. Please
explain the reason for wanting that check in there and I can add it back.
I might just be missing the obvious.

---

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] eventfd: prevent underflow for eventfd semaphores
https://git.kernel.org/vfs/vfs/c/7b2edd278691

2023-07-11 18:57:39

by Wen Yang

[permalink] [raw]
Subject: Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0


On 2023/7/11 17:39, Christian Brauner wrote:
> On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote:
>> On 2023/7/10 22:12, Christian Brauner wrote:
>>> On Sun, Jul 09, 2023 at 02:54:51PM +0800, [email protected] wrote:
>>>> From: Wen Yang <[email protected]>
>>>>
>>>> For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
>>>> eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
>>>>
>>>> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
>>>> Signed-off-by: Wen Yang <[email protected]>
>>>> Cc: Alexander Viro <[email protected]>
>>>> Cc: Jens Axboe <[email protected]>
>>>> Cc: Christian Brauner <[email protected]>
>>>> Cc: Christoph Hellwig <[email protected]>
>>>> Cc: Dylan Yudaken <[email protected]>
>>>> Cc: David Woodhouse <[email protected]>
>>>> Cc: Matthew Wilcox <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> ---
>>> So this looks ok but I would like to see an analysis how the overflow
>>> can happen. I'm looking at the callers and it seems that once ctx->count
>>> hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is
>>> there a caller that can call directly or indirectly
>>> eventfd_ctx_do_read() on a ctx->count == 0?
>> eventfd_read() ensures that ctx->count is not 0 before calling
>> eventfd_ctx_do_read() and it is correct.
>>
>> But it is not appropriate for eventfd_ctx_remove_wait_queue() to call
>> eventfd_ctx_do_read() unconditionally,
>>
>> as it may not only causes ctx->count to overflow, but also unnecessarily
>> calls wake_up_locked_poll().
>>
>>
>> I am sorry for just adding the following string in the patch:
>> Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
>>
>>
>> Looking forward to your suggestions.
>>
>> --
>>
>> Best wishes,
>>
>> Wen
>>
>>
>>> I'm just slightly skeptical about patches that fix issues without an
>>> analysis how this can happen.
>>>
>>>> fs/eventfd.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>>>> index 8aa36cd37351..10a101df19cd 100644
>>>> --- a/fs/eventfd.c
>>>> +++ b/fs/eventfd.c
>>>> @@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
>>>> {
>>>> lockdep_assert_held(&ctx->wqh.lock);
>>>> - *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
>>>> + *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count;
>>>> ctx->count -= *cnt;
>>>> }
>>>> EXPORT_SYMBOL_GPL(eventfd_ctx_do_read);
>>>> @@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
>>>> return -EFAULT;
>>>> if (ucnt == ULLONG_MAX)
>>>> return -EINVAL;
>>>> + if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
>>>> + return -EINVAL;
> Hm, why is bit necessary though? What's wrong with specifying ucnt == 0
> with EFD_SEMAPHORE? This also looks like a (very low potential) uapi
> break.


Thank you for your careful review.

As you pointed out, this may break the uapi.
Although manaul has the following description (man 2 eventfd):
*  The file descriptor is readable (the select(2) readfds argument; the
poll(2) POLLIN flag) if the counter has a value greater than 0.
*  The file descriptor is writable (the select(2) writefds argument; the
poll(2) POLLOUT flag) if it is possible to write a value of at least "1"
without blocking.

But it does not specify that the ucnt cannot be zero, so we can only
delete the two lines of code above

Could we propose another patch specifically to address the issue you
have identified?

Since there are indeed some corner scenes when ucnt is 0 and ctx->count
is also 0:


static ssize_t eventfd_write(struct file *file, const char __user *buf,
size_t count,
                             loff_t *ppos)
{
...
        if (ULLONG_MAX - ctx->count > ucnt)
                res = sizeof(ucnt);                 ---> always > 0
...
        if (likely(res > 0)) {
                ctx->count += ucnt;                 ---> unnecessary
addition of 0
                current->in_eventfd = 1;            ---> May affect
eventfd_signal()
                if (waitqueue_active(&ctx->wqh))
                        wake_up_locked_poll(&ctx->wqh, EPOLLIN);  --->
heavyweight wakeup
                current->in_eventfd = 0;
        }
...
}


static __poll_t eventfd_poll(struct file *file, poll_table *wait)
{
...
        count = READ_ONCE(ctx->count);

        if (count > 0)
                events |= EPOLLIN;    ---> If count is 0, all previous
operations are wasted

        if (count == ULLONG_MAX)
                events |= EPOLLERR;
        if (ULLONG_MAX - 1 > count)
                events |= EPOLLOUT;

        return events;
}

Could we optimize it like this?

static ssize_t eventfd_write(struct file *file, const char __user *buf,
size_t count,
                             loff_t *ppos)
{
...
        if (likely(res > 0)) {
                ctx->count += ucnt;
                if (ctx->count) {                       ---> avoiding
unnecessary heavyweight operations
                        current->in_eventfd = 1;
                        if (waitqueue_active(&ctx->wqh))
wake_up_locked_poll(&ctx->wqh, EPOLLIN);
                        current->in_eventfd = 0;
                }
        }
...
}


--

Best wishes,

Wen