This fixes possible lost wakeup introduced by the a218cc491420.
Originally modifications to ep->wq were serialized by ep->wq.lock,
but in the a218cc491420 new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.
After the change ep_modify and ep_insert (both are called on
epoll_ctl() path) were switched to ep->lock, but ep_poll
(epoll_wait) was using ep->wq.lock on wqueue list modification.
The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock
internally, but actual waitqueue_active() check prior wake_up()
call can be reordered with modification of ep ready list, thus
wake up can be lost.
Current patch replaces ep->wq.lock with the ep->lock for wqueue
modifications, thus wake up path always observes activeness of
the wqueue correcty.
Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Signed-off-by: Roman Penyaev <[email protected]>
Cc: Max Neunhoeffer <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Christopher Kohlhoff <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
fs/eventpoll.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b041b66002db..eee3c92a9ebf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
waiter = true;
init_waitqueue_entry(&wait, current);
- spin_lock_irq(&ep->wq.lock);
+ write_lock_irq(&ep->lock);
__add_wait_queue_exclusive(&ep->wq, &wait);
- spin_unlock_irq(&ep->wq.lock);
+ write_unlock_irq(&ep->lock);
}
for (;;) {
@@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
goto fetch_events;
if (waiter) {
- spin_lock_irq(&ep->wq.lock);
+ write_lock_irq(&ep->lock);
__remove_wait_queue(&ep->wq, &wait);
- spin_unlock_irq(&ep->wq.lock);
+ write_unlock_irq(&ep->lock);
}
return res;
--
2.24.1
This testcase repeats epollbug.c from the bug:
https://bugzilla.kernel.org/show_bug.cgi?id=205933
Signed-off-by: Roman Penyaev <[email protected]>
Cc: Max Neunhoeffer <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Christopher Kohlhoff <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
.../filesystems/epoll/epoll_wakeup_test.c | 67 ++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
index 37a04dab56f0..11eee0b60040 100644
--- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
+++ b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
@@ -7,13 +7,14 @@
#include <pthread.h>
#include <sys/epoll.h>
#include <sys/socket.h>
+#include <sys/eventfd.h>
#include "../../kselftest_harness.h"
struct epoll_mtcontext
{
int efd[3];
int sfd[4];
- int count;
+ volatile int count;
pthread_t main;
pthread_t waiter;
@@ -3071,4 +3072,68 @@ TEST(epoll58)
close(ctx.sfd[3]);
}
+static void *epoll59_thread(void *ctx_)
+{
+ struct epoll_mtcontext *ctx = ctx_;
+ struct epoll_event e;
+ int i;
+
+ for (i = 0; i < 100000; i++) {
+ while (ctx->count == 0)
+ ;
+
+ e.events = EPOLLIN | EPOLLERR | EPOLLET;
+ epoll_ctl(ctx->efd[0], EPOLL_CTL_MOD, ctx->sfd[0], &e);
+ ctx->count = 0;
+ }
+
+ return NULL;
+}
+
+/*
+ * t0
+ * (p) \
+ * e0
+ * (et) /
+ * e0
+ *
+ * Based on https://bugzilla.kernel.org/show_bug.cgi?id=205933
+ */
+TEST(epoll59)
+{
+ pthread_t emitter;
+ struct pollfd pfd;
+ struct epoll_event e;
+ struct epoll_mtcontext ctx = { 0 };
+ int i, ret;
+
+ signal(SIGUSR1, signal_handler);
+
+ ctx.efd[0] = epoll_create1(0);
+ ASSERT_GE(ctx.efd[0], 0);
+
+ ctx.sfd[0] = eventfd(1, 0);
+ ASSERT_GE(ctx.sfd[0], 0);
+
+ e.events = EPOLLIN | EPOLLERR | EPOLLET;
+ ASSERT_EQ(epoll_ctl(ctx.efd[0], EPOLL_CTL_ADD, ctx.sfd[0], &e), 0);
+
+ ASSERT_EQ(pthread_create(&emitter, NULL, epoll59_thread, &ctx), 0);
+
+ for (i = 0; i < 100000; i++) {
+ ret = epoll_wait(ctx.efd[0], &e, 1, 1000);
+ ASSERT_GT(ret, 0);
+
+ while (ctx.count != 0)
+ ;
+ ctx.count = 1;
+ }
+ if (pthread_tryjoin_np(emitter, NULL) < 0) {
+ pthread_kill(emitter, SIGUSR1);
+ pthread_join(emitter, NULL);
+ }
+ close(ctx.efd[0]);
+ close(ctx.sfd[0]);
+}
+
TEST_HARNESS_MAIN
--
2.24.1
Hi Andrew,
Could you please suggest me, do I need to include Reported-by tag,
or reference to the bug is enough?
--
Roman
On 2020-02-03 21:59, Roman Penyaev wrote:
> This fixes possible lost wakeup introduced by the a218cc491420.
> Originally modifications to ep->wq were serialized by ep->wq.lock,
> but in the a218cc491420 new rw lock was introduced in order to
> relax fd event path, i.e. callers of ep_poll_callback() function.
>
> After the change ep_modify and ep_insert (both are called on
> epoll_ctl() path) were switched to ep->lock, but ep_poll
> (epoll_wait) was using ep->wq.lock on wqueue list modification.
>
> The bug doesn't lead to any wqueue list corruptions, because wake up
> path and list modifications were serialized by ep->wq.lock
> internally, but actual waitqueue_active() check prior wake_up()
> call can be reordered with modification of ep ready list, thus
> wake up can be lost.
>
> Current patch replaces ep->wq.lock with the ep->lock for wqueue
> modifications, thus wake up path always observes activeness of
> the wqueue correcty.
>
> Fixes: a218cc491420 ("epoll: use rwlock in order to reduce
> ep_poll_callback() contention")
> References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
> Signed-off-by: Roman Penyaev <[email protected]>
> Cc: Max Neunhoeffer <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Christopher Kohlhoff <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Jason Baron <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> fs/eventpoll.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index b041b66002db..eee3c92a9ebf 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct
> epoll_event __user *events,
> waiter = true;
> init_waitqueue_entry(&wait, current);
>
> - spin_lock_irq(&ep->wq.lock);
> + write_lock_irq(&ep->lock);
> __add_wait_queue_exclusive(&ep->wq, &wait);
> - spin_unlock_irq(&ep->wq.lock);
> + write_unlock_irq(&ep->lock);
> }
>
> for (;;) {
> @@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct
> epoll_event __user *events,
> goto fetch_events;
>
> if (waiter) {
> - spin_lock_irq(&ep->wq.lock);
> + write_lock_irq(&ep->lock);
> __remove_wait_queue(&ep->wq, &wait);
> - spin_unlock_irq(&ep->wq.lock);
> + write_unlock_irq(&ep->lock);
> }
>
> return res;
On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> Hi Andrew,
>
> Could you please suggest me, do I need to include Reported-by tag,
> or reference to the bug is enough?
Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
fully credit the extra work done by the reporter.
On 2020-02-04 17:32, Jakub Kicinski wrote:
> On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> Hi Andrew,
>>
>> Could you please suggest me, do I need to include Reported-by tag,
>> or reference to the bug is enough?
>
> Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> fully credit the extra work done by the reporter.
Reported-by: Max Neunhoeffer <[email protected]>
Bisected-by: Max Neunhoeffer <[email protected]>
Correct?
--
Roman
On Tue, 04 Feb 2020 18:20:03 +0100, Roman Penyaev wrote:
> On 2020-02-04 17:32, Jakub Kicinski wrote:
> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> >> Hi Andrew,
> >>
> >> Could you please suggest me, do I need to include Reported-by tag,
> >> or reference to the bug is enough?
> >
> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> > fully credit the extra work done by the reporter.
>
> Reported-by: Max Neunhoeffer <[email protected]>
> Bisected-by: Max Neunhoeffer <[email protected]>
>
> Correct?
That should work, I like the brevity of the single combined
Reported-and-bisected-by: Max Neunhoeffer <[email protected]>
line but looks like some separate the two even when both point
to the same person.
Unfortunately Documentation/process is silent on any "bisected-by"
use, so you'll have to exercise your own judgement :)
On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <[email protected]> wrote:
> On 2020-02-04 17:32, Jakub Kicinski wrote:
> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> >> Hi Andrew,
> >>
> >> Could you please suggest me, do I need to include Reported-by tag,
> >> or reference to the bug is enough?
> >
> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> > fully credit the extra work done by the reporter.
>
> Reported-by: Max Neunhoeffer <[email protected]>
> Bisected-by: Max Neunhoeffer <[email protected]>
>
> Correct?
We could do that, but preferably with Max's approval (please?).
Because people sometimes have issues with having their personal info
added to the kernel commit record without having being asked.
Dear Andrew and all,
Sorry, I did not understand that I should explicitly give consent.
This is fine with me.
Cheers
Max
Am 10. Februar 2020 06:59:16 MEZ schrieb Andrew Morton <[email protected]>:
>On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <[email protected]>
>wrote:
>
>> On 2020-02-04 17:32, Jakub Kicinski wrote:
>> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> >> Hi Andrew,
>> >>
>> >> Could you please suggest me, do I need to include Reported-by tag,
>> >> or reference to the bug is enough?
>> >
>> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag
>to
>> > fully credit the extra work done by the reporter.
>>
>> Reported-by: Max Neunhoeffer <[email protected]>
>> Bisected-by: Max Neunhoeffer <[email protected]>
>>
>> Correct?
>
>We could do that, but preferably with Max's approval (please?).
>
>Because people sometimes have issues with having their personal info
>added to the kernel commit record without having being asked.
On 2020-02-10 06:59, Andrew Morton wrote:
> On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <[email protected]>
> wrote:
>
>> On 2020-02-04 17:32, Jakub Kicinski wrote:
>> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> >> Hi Andrew,
>> >>
>> >> Could you please suggest me, do I need to include Reported-by tag,
>> >> or reference to the bug is enough?
>> >
>> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
>> > fully credit the extra work done by the reporter.
>>
>> Reported-by: Max Neunhoeffer <[email protected]>
>> Bisected-by: Max Neunhoeffer <[email protected]>
>>
>> Correct?
>
> We could do that, but preferably with Max's approval (please?).
>
> Because people sometimes have issues with having their personal info
> added to the kernel commit record without having being asked.
Max approved. I've just resent v2, no code changes, comment
tweaks and 2 explicit tags with Max's name.
--
Roman