2020-02-03 21:00:27

by Roman Penyaev

[permalink] [raw]
Subject: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

This fixes possible lost wakeup introduced by the a218cc491420.
Originally modifications to ep->wq were serialized by ep->wq.lock,
but in the a218cc491420 new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on
epoll_ctl() path) were switched to ep->lock, but ep_poll
(epoll_wait) was using ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock
internally, but actual waitqueue_active() check prior wake_up()
call can be reordered with modification of ep ready list, thus
wake up can be lost.

Current patch replaces ep->wq.lock with the ep->lock for wqueue
modifications, thus wake up path always observes activeness of
the wqueue correcty.

Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Signed-off-by: Roman Penyaev <[email protected]>
Cc: Max Neunhoeffer <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Christopher Kohlhoff <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
fs/eventpoll.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b041b66002db..eee3c92a9ebf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
waiter = true;
init_waitqueue_entry(&wait, current);

- spin_lock_irq(&ep->wq.lock);
+ write_lock_irq(&ep->lock);
__add_wait_queue_exclusive(&ep->wq, &wait);
- spin_unlock_irq(&ep->wq.lock);
+ write_unlock_irq(&ep->lock);
}

for (;;) {
@@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
goto fetch_events;

if (waiter) {
- spin_lock_irq(&ep->wq.lock);
+ write_lock_irq(&ep->lock);
__remove_wait_queue(&ep->wq, &wait);
- spin_unlock_irq(&ep->wq.lock);
+ write_unlock_irq(&ep->lock);
}

return res;
--
2.24.1


2020-02-03 21:01:27

by Roman Penyaev

[permalink] [raw]
Subject: [PATCH 3/3] kselftest: introduce new epoll test case

This testcase repeats epollbug.c from the bug:

https://bugzilla.kernel.org/show_bug.cgi?id=205933

Signed-off-by: Roman Penyaev <[email protected]>
Cc: Max Neunhoeffer <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Christopher Kohlhoff <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
.../filesystems/epoll/epoll_wakeup_test.c | 67 ++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
index 37a04dab56f0..11eee0b60040 100644
--- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
+++ b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
@@ -7,13 +7,14 @@
#include <pthread.h>
#include <sys/epoll.h>
#include <sys/socket.h>
+#include <sys/eventfd.h>
#include "../../kselftest_harness.h"

struct epoll_mtcontext
{
int efd[3];
int sfd[4];
- int count;
+ volatile int count;

pthread_t main;
pthread_t waiter;
@@ -3071,4 +3072,68 @@ TEST(epoll58)
close(ctx.sfd[3]);
}

+static void *epoll59_thread(void *ctx_)
+{
+ struct epoll_mtcontext *ctx = ctx_;
+ struct epoll_event e;
+ int i;
+
+ for (i = 0; i < 100000; i++) {
+ while (ctx->count == 0)
+ ;
+
+ e.events = EPOLLIN | EPOLLERR | EPOLLET;
+ epoll_ctl(ctx->efd[0], EPOLL_CTL_MOD, ctx->sfd[0], &e);
+ ctx->count = 0;
+ }
+
+ return NULL;
+}
+
+/*
+ * t0
+ * (p) \
+ * e0
+ * (et) /
+ * e0
+ *
+ * Based on https://bugzilla.kernel.org/show_bug.cgi?id=205933
+ */
+TEST(epoll59)
+{
+ pthread_t emitter;
+ struct pollfd pfd;
+ struct epoll_event e;
+ struct epoll_mtcontext ctx = { 0 };
+ int i, ret;
+
+ signal(SIGUSR1, signal_handler);
+
+ ctx.efd[0] = epoll_create1(0);
+ ASSERT_GE(ctx.efd[0], 0);
+
+ ctx.sfd[0] = eventfd(1, 0);
+ ASSERT_GE(ctx.sfd[0], 0);
+
+ e.events = EPOLLIN | EPOLLERR | EPOLLET;
+ ASSERT_EQ(epoll_ctl(ctx.efd[0], EPOLL_CTL_ADD, ctx.sfd[0], &e), 0);
+
+ ASSERT_EQ(pthread_create(&emitter, NULL, epoll59_thread, &ctx), 0);
+
+ for (i = 0; i < 100000; i++) {
+ ret = epoll_wait(ctx.efd[0], &e, 1, 1000);
+ ASSERT_GT(ret, 0);
+
+ while (ctx.count != 0)
+ ;
+ ctx.count = 1;
+ }
+ if (pthread_tryjoin_np(emitter, NULL) < 0) {
+ pthread_kill(emitter, SIGUSR1);
+ pthread_join(emitter, NULL);
+ }
+ close(ctx.efd[0]);
+ close(ctx.sfd[0]);
+}
+
TEST_HARNESS_MAIN
--
2.24.1

2020-02-04 10:42:52

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

Hi Andrew,

Could you please suggest me, do I need to include Reported-by tag,
or reference to the bug is enough?

--
Roman

On 2020-02-03 21:59, Roman Penyaev wrote:
> This fixes possible lost wakeup introduced by the a218cc491420.
> Originally modifications to ep->wq were serialized by ep->wq.lock,
> but in the a218cc491420 new rw lock was introduced in order to
> relax fd event path, i.e. callers of ep_poll_callback() function.
>
> After the change ep_modify and ep_insert (both are called on
> epoll_ctl() path) were switched to ep->lock, but ep_poll
> (epoll_wait) was using ep->wq.lock on wqueue list modification.
>
> The bug doesn't lead to any wqueue list corruptions, because wake up
> path and list modifications were serialized by ep->wq.lock
> internally, but actual waitqueue_active() check prior wake_up()
> call can be reordered with modification of ep ready list, thus
> wake up can be lost.
>
> Current patch replaces ep->wq.lock with the ep->lock for wqueue
> modifications, thus wake up path always observes activeness of
> the wqueue correcty.
>
> Fixes: a218cc491420 ("epoll: use rwlock in order to reduce
> ep_poll_callback() contention")
> References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Max Neunhoeffer <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Christopher Kohlhoff <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Jason Baron <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> fs/eventpoll.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index b041b66002db..eee3c92a9ebf 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct
> epoll_event __user *events,
> waiter = true;
> init_waitqueue_entry(&wait, current);
>
> - spin_lock_irq(&ep->wq.lock);
> + write_lock_irq(&ep->lock);
> __add_wait_queue_exclusive(&ep->wq, &wait);
> - spin_unlock_irq(&ep->wq.lock);
> + write_unlock_irq(&ep->lock);
> }
>
> for (;;) {
> @@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct
> epoll_event __user *events,
> goto fetch_events;
>
> if (waiter) {
> - spin_lock_irq(&ep->wq.lock);
> + write_lock_irq(&ep->lock);
> __remove_wait_queue(&ep->wq, &wait);
> - spin_unlock_irq(&ep->wq.lock);
> + write_unlock_irq(&ep->lock);
> }
>
> return res;

2020-02-04 16:35:06

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> Hi Andrew,
>
> Could you please suggest me, do I need to include Reported-by tag,
> or reference to the bug is enough?

Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
fully credit the extra work done by the reporter.

2020-02-04 17:21:35

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

On 2020-02-04 17:32, Jakub Kicinski wrote:
> On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> Hi Andrew,
>>
>> Could you please suggest me, do I need to include Reported-by tag,
>> or reference to the bug is enough?
>
> Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> fully credit the extra work done by the reporter.

Reported-by: Max Neunhoeffer <[email protected]>
Bisected-by: Max Neunhoeffer <[email protected]>

Correct?

--
Roman

2020-02-04 22:31:20

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

On Tue, 04 Feb 2020 18:20:03 +0100, Roman Penyaev wrote:
> On 2020-02-04 17:32, Jakub Kicinski wrote:
> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> >> Hi Andrew,
> >>
> >> Could you please suggest me, do I need to include Reported-by tag,
> >> or reference to the bug is enough?
> >
> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> > fully credit the extra work done by the reporter.
>
> Reported-by: Max Neunhoeffer <[email protected]>
> Bisected-by: Max Neunhoeffer <[email protected]>
>
> Correct?

That should work, I like the brevity of the single combined

Reported-and-bisected-by: Max Neunhoeffer <[email protected]>

line but looks like some separate the two even when both point
to the same person.

Unfortunately Documentation/process is silent on any "bisected-by"
use, so you'll have to exercise your own judgement :)

2020-02-10 05:59:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <[email protected]> wrote:

> On 2020-02-04 17:32, Jakub Kicinski wrote:
> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> >> Hi Andrew,
> >>
> >> Could you please suggest me, do I need to include Reported-by tag,
> >> or reference to the bug is enough?
> >
> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> > fully credit the extra work done by the reporter.
>
> Reported-by: Max Neunhoeffer <[email protected]>
> Bisected-by: Max Neunhoeffer <[email protected]>
>
> Correct?

We could do that, but preferably with Max's approval (please?).

Because people sometimes have issues with having their personal info
added to the kernel commit record without having being asked.

2020-02-10 06:45:46

by Max Neunhöffer

[permalink] [raw]
Subject: Re: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

Dear Andrew and all,
Sorry, I did not understand that I should explicitly give consent.
This is fine with me.
Cheers
Max

Am 10. Februar 2020 06:59:16 MEZ schrieb Andrew Morton <[email protected]>:
>On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <[email protected]>
>wrote:
>
>> On 2020-02-04 17:32, Jakub Kicinski wrote:
>> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> >> Hi Andrew,
>> >>
>> >> Could you please suggest me, do I need to include Reported-by tag,
>> >> or reference to the bug is enough?
>> >
>> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag
>to
>> > fully credit the extra work done by the reporter.
>>
>> Reported-by: Max Neunhoeffer <[email protected]>
>> Bisected-by: Max Neunhoeffer <[email protected]>
>>
>> Correct?
>
>We could do that, but preferably with Max's approval (please?).
>
>Because people sometimes have issues with having their personal info
>added to the kernel commit record without having being asked.

2020-02-10 09:44:17

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH 1/3] epoll: fix possible lost wakeup on epoll_ctl() path

On 2020-02-10 06:59, Andrew Morton wrote:
> On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <[email protected]>
> wrote:
>
>> On 2020-02-04 17:32, Jakub Kicinski wrote:
>> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> >> Hi Andrew,
>> >>
>> >> Could you please suggest me, do I need to include Reported-by tag,
>> >> or reference to the bug is enough?
>> >
>> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
>> > fully credit the extra work done by the reporter.
>>
>> Reported-by: Max Neunhoeffer <[email protected]>
>> Bisected-by: Max Neunhoeffer <[email protected]>
>>
>> Correct?
>
> We could do that, but preferably with Max's approval (please?).
>
> Because people sometimes have issues with having their personal info
> added to the kernel commit record without having being asked.

Max approved. I've just resent v2, no code changes, comment
tweaks and 2 explicit tags with Max's name.

--
Roman