2012-05-19 22:07:04

by Jiri Slaby

[permalink] [raw]
Subject: [-next regression] TCP window full with EPOLLWAKEUP

Hi,

a bisection shows that with the following commit from -next:
commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
Author: Arve Hjønnevåg <[email protected]>
Date: Tue May 1 21:33:34 2012 +0200

epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll
events are ready

====

one of mono programs I use stops receiving data from the network.
Wireshark shows that the TCP window of a connection is filled. This
means the program does not read the data fast enough after requesting
the data.

If I revert that commit on the top of -next (20120518), everything works
as expected.

regards,
--
js
suse labs


2012-05-20 12:18:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [-next regression] TCP window full with EPOLLWAKEUP

On Sunday, May 20, 2012, Jiri Slaby wrote:
> Hi,
>
> a bisection shows that with the following commit from -next:
> commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
> Author: Arve Hjønnevåg <[email protected]>
> Date: Tue May 1 21:33:34 2012 +0200
>
> epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll
> events are ready
>
> ====
>
> one of mono programs I use stops receiving data from the network.
> Wireshark shows that the TCP window of a connection is filled. This
> means the program does not read the data fast enough after requesting
> the data.
>
> If I revert that commit on the top of -next (20120518), everything works
> as expected.

Hmm. I suppose that the failing program doesn't set EPOLLWAKEUP by mistake,
does it?

Rafael

2012-05-20 12:48:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [-next regression] TCP window full with EPOLLWAKEUP

On Sunday, May 20, 2012, Rafael J. Wysocki wrote:
> On Sunday, May 20, 2012, Jiri Slaby wrote:
> > Hi,
> >
> > a bisection shows that with the following commit from -next:
> > commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
> > Author: Arve Hjønnevåg <[email protected]>
> > Date: Tue May 1 21:33:34 2012 +0200
> >
> > epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll
> > events are ready
> >
> > ====
> >
> > one of mono programs I use stops receiving data from the network.
> > Wireshark shows that the TCP window of a connection is filled. This
> > means the program does not read the data fast enough after requesting
> > the data.
> >
> > If I revert that commit on the top of -next (20120518), everything works
> > as expected.
>
> Hmm. I suppose that the failing program doesn't set EPOLLWAKEUP by mistake,
> does it?

If it doesn't, we can assume that epi-ws is always NULL and all of the added
overhead comes from the function calls. So, I wonder if the appended patch
makes any difference?

Rafael


---
fs/eventpoll.c | 35 ++++++++++++++++++++++++-----------
1 file changed, 24 insertions(+), 11 deletions(-)

Index: linux/fs/eventpoll.c
===================================================================
--- linux.orig/fs/eventpoll.c
+++ linux/fs/eventpoll.c
@@ -597,7 +597,8 @@ static int ep_scan_ready_list(struct eve
*/
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
- __pm_stay_awake(epi->ws);
+ if (epi->ws)
+ __pm_stay_awake(epi->ws);
}
}
/*
@@ -611,7 +612,8 @@ static int ep_scan_ready_list(struct eve
* Quickly re-inject items left on "txlist".
*/
list_splice(&txlist, &ep->rdllist);
- __pm_relax(ep->ws);
+ if (ep->ws)
+ __pm_relax(ep->ws);

if (!list_empty(&ep->rdllist)) {
/*
@@ -750,7 +752,9 @@ static int ep_read_events_proc(struct ev
* callback, but it's not actually ready, as far as
* caller requested events goes. We can remove it here.
*/
- __pm_relax(epi->ws);
+ if (epi->ws)
+ __pm_relax(epi->ws);
+
list_del_init(&epi->rdllink);
}
}
@@ -956,7 +960,8 @@ static int ep_poll_callback(wait_queue_t
/* If this file is already in the ready list we exit soon */
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
- __pm_stay_awake(epi->ws);
+ if (epi->ws)
+ __pm_stay_awake(epi->ws);
}

/*
@@ -1219,7 +1224,8 @@ static int ep_insert(struct eventpoll *e
/* If the file is already "ready" we drop it inside the ready list */
if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
- __pm_stay_awake(epi->ws);
+ if (epi->ws)
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1309,7 +1315,8 @@ static int ep_modify(struct eventpoll *e
spin_lock_irq(&ep->lock);
if (!ep_is_linked(&epi->rdllink)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
- __pm_stay_awake(epi->ws);
+ if (epi->ws)
+ __pm_stay_awake(epi->ws);

/* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
@@ -1357,9 +1364,12 @@ static int ep_send_events_proc(struct ev
* instead, but then epi->ws would temporarily be out of sync
* with ep_is_linked().
*/
- if (epi->ws && epi->ws->active)
- __pm_stay_awake(ep->ws);
- __pm_relax(epi->ws);
+ if (epi->ws) {
+ if (epi->ws->active)
+ __pm_stay_awake(ep->ws);
+
+ __pm_relax(epi->ws);
+ }
list_del_init(&epi->rdllink);

pt._key = epi->event.events;
@@ -1376,7 +1386,9 @@ static int ep_send_events_proc(struct ev
if (__put_user(revents, &uevent->events) ||
__put_user(epi->event.data, &uevent->data)) {
list_add(&epi->rdllink, head);
- __pm_stay_awake(epi->ws);
+ if (epi->ws)
+ __pm_stay_awake(epi->ws);
+
return eventcnt ? eventcnt : -EFAULT;
}
eventcnt++;
@@ -1396,7 +1408,8 @@ static int ep_send_events_proc(struct ev
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
- __pm_stay_awake(epi->ws);
+ if (epi->ws)
+ __pm_stay_awake(epi->ws);
}
}
}

2012-05-20 18:29:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [-next regression] TCP window full with EPOLLWAKEUP

On Sunday, May 20, 2012, Rafael J. Wysocki wrote:
> On Sunday, May 20, 2012, Rafael J. Wysocki wrote:
> > On Sunday, May 20, 2012, Jiri Slaby wrote:
> > > Hi,
> > >
> > > a bisection shows that with the following commit from -next:
> > > commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
> > > Author: Arve Hjønnevåg <[email protected]>
> > > Date: Tue May 1 21:33:34 2012 +0200
> > >
> > > epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll
> > > events are ready
> > >
> > > ====
> > >
> > > one of mono programs I use stops receiving data from the network.
> > > Wireshark shows that the TCP window of a connection is filled. This
> > > means the program does not read the data fast enough after requesting
> > > the data.
> > >
> > > If I revert that commit on the top of -next (20120518), everything works
> > > as expected.
> >
> > Hmm. I suppose that the failing program doesn't set EPOLLWAKEUP by mistake,
> > does it?
>
> If it doesn't, we can assume that epi-ws is always NULL and all of the added
> overhead comes from the function calls. So, I wonder if the appended patch
> makes any difference?

Having thought more about this I have to say this doesn't seem to make much
sense, because in that case you'd see some progress, although probably a bit
slower than before.

So, I think what happens is that the application tries to set EPOLLWAKEUP,
but doesn't have the capability, so the entire operation fails for it, but
it doesn't check the return value.

I wonder if the following helps, then.

Thanks,
Rafael


---
fs/eventpoll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/fs/eventpoll.c
===================================================================
--- linux.orig/fs/eventpoll.c
+++ linux/fs/eventpoll.c
@@ -1711,7 +1711,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in

/* Check if EPOLLWAKEUP is allowed */
if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
- goto error_tgt_fput;
+ epds.events &= ~EPOLLWAKEUP;

/*
* We have to check that the file structure underneath the file descriptor

2012-05-21 13:55:26

by Jiri Slaby

[permalink] [raw]
Subject: Re: [-next regression] TCP window full with EPOLLWAKEUP

On 05/20/2012 08:34 PM, Rafael J. Wysocki wrote:
> On Sunday, May 20, 2012, Rafael J. Wysocki wrote:
>> On Sunday, May 20, 2012, Rafael J. Wysocki wrote:
>>> On Sunday, May 20, 2012, Jiri Slaby wrote:
>>>> Hi,
>>>>
>>>> a bisection shows that with the following commit from -next:
>>>> commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
>>>> Author: Arve Hjønnevåg <[email protected]>
>>>> Date: Tue May 1 21:33:34 2012 +0200
>>>>
>>>> epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll
>>>> events are ready
>>>>
>>>> ====
>>>>
>>>> one of mono programs I use stops receiving data from the network.
>>>> Wireshark shows that the TCP window of a connection is filled. This
>>>> means the program does not read the data fast enough after requesting
>>>> the data.
>>>>
>>>> If I revert that commit on the top of -next (20120518), everything works
>>>> as expected.
>>>
>>> Hmm. I suppose that the failing program doesn't set EPOLLWAKEUP by mistake,
>>> does it?
>>
>> If it doesn't, we can assume that epi-ws is always NULL and all of the added
>> overhead comes from the function calls. So, I wonder if the appended patch
>> makes any difference?
>
> Having thought more about this I have to say this doesn't seem to make much
> sense, because in that case you'd see some progress, although probably a bit
> slower than before.
>
> So, I think what happens is that the application tries to set EPOLLWAKEUP,
> but doesn't have the capability, so the entire operation fails for it, but
> it doesn't check the return value.
>
> I wonder if the following helps, then.
>
> Thanks,
> Rafael
>
>
> ---
> fs/eventpoll.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux/fs/eventpoll.c
> ===================================================================
> --- linux.orig/fs/eventpoll.c
> +++ linux/fs/eventpoll.c
> @@ -1711,7 +1711,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in
>
> /* Check if EPOLLWAKEUP is allowed */
> if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
> - goto error_tgt_fput;
> + epds.events &= ~EPOLLWAKEUP;

Yes, this fixed the issue.

thanks,
--
js
suse labs

2012-05-21 19:06:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [-next regression] TCP window full with EPOLLWAKEUP

On Monday, May 21, 2012, Jiri Slaby wrote:
> On 05/20/2012 08:34 PM, Rafael J. Wysocki wrote:
> > On Sunday, May 20, 2012, Rafael J. Wysocki wrote:
> >> On Sunday, May 20, 2012, Rafael J. Wysocki wrote:
> >>> On Sunday, May 20, 2012, Jiri Slaby wrote:
> >>>> Hi,
> >>>>
> >>>> a bisection shows that with the following commit from -next:
> >>>> commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
> >>>> Author: Arve Hjønnevåg <[email protected]>
> >>>> Date: Tue May 1 21:33:34 2012 +0200
> >>>>
> >>>> epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll
> >>>> events are ready
> >>>>
> >>>> ====
> >>>>
> >>>> one of mono programs I use stops receiving data from the network.
> >>>> Wireshark shows that the TCP window of a connection is filled. This
> >>>> means the program does not read the data fast enough after requesting
> >>>> the data.
> >>>>
> >>>> If I revert that commit on the top of -next (20120518), everything works
> >>>> as expected.
> >>>
> >>> Hmm. I suppose that the failing program doesn't set EPOLLWAKEUP by mistake,
> >>> does it?
> >>
> >> If it doesn't, we can assume that epi-ws is always NULL and all of the added
> >> overhead comes from the function calls. So, I wonder if the appended patch
> >> makes any difference?
> >
> > Having thought more about this I have to say this doesn't seem to make much
> > sense, because in that case you'd see some progress, although probably a bit
> > slower than before.
> >
> > So, I think what happens is that the application tries to set EPOLLWAKEUP,
> > but doesn't have the capability, so the entire operation fails for it, but
> > it doesn't check the return value.
> >
> > I wonder if the following helps, then.
> >
> > Thanks,
> > Rafael
> >
> >
> > ---
> > fs/eventpoll.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: linux/fs/eventpoll.c
> > ===================================================================
> > --- linux.orig/fs/eventpoll.c
> > +++ linux/fs/eventpoll.c
> > @@ -1711,7 +1711,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in
> >
> > /* Check if EPOLLWAKEUP is allowed */
> > if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
> > - goto error_tgt_fput;
> > + epds.events &= ~EPOLLWAKEUP;
>
> Yes, this fixed the issue.

Cool, thanks for testing!

I'll resend it in a while with a changelog and tags.

Thanks,
Rafael

2012-05-21 19:23:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH] epoll: Fix user space breakage related to EPOLLWAKEUP (was: Re: [-next regression] TCP window full with EPOLLWAKEUP)

From: Rafael J. Wysocki <[email protected]>

Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
suspend while epoll events are ready) caused some applications to
malfunction, because they set the bit corresponding to the new
EPOLLWAKEUP flag in their eventpoll flags and they don't have the
new CAP_EPOLLWAKEUP capability.

To prevent that from happening, change epoll_ctl() to clear
EPOLLWAKEUP in epds.events if the caller doesn't have the
CAP_EPOLLWAKEUP capability instead of failing and returning an
error code, which allows the affected applications to function
normally.

Reported-and-tested-by: Jiri Slaby <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
fs/eventpoll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/fs/eventpoll.c
===================================================================
--- linux.orig/fs/eventpoll.c
+++ linux/fs/eventpoll.c
@@ -1711,7 +1711,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in

/* Check if EPOLLWAKEUP is allowed */
if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
- goto error_tgt_fput;
+ epds.events &= ~EPOLLWAKEUP;

/*
* We have to check that the file structure underneath the file descriptor

2012-05-21 22:11:50

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [PATCH] epoll: Fix user space breakage related to EPOLLWAKEUP (was: Re: [-next regression] TCP window full with EPOLLWAKEUP)

On Mon, May 21, 2012 at 12:28 PM, Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
> suspend while epoll events are ready) caused some applications to
> malfunction, because they set the bit corresponding to the new
> EPOLLWAKEUP flag in their eventpoll flags and they don't have the
> new CAP_EPOLLWAKEUP capability.
>
> To prevent that from happening, change epoll_ctl() to clear
> EPOLLWAKEUP in epds.events if the caller doesn't have the
> CAP_EPOLLWAKEUP capability instead of failing and returning an
> error code, which allows the affected applications to function
> normally.
>
> Reported-and-tested-by: Jiri Slaby <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> ?fs/eventpoll.c | ? ?2 +-
> ?1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux/fs/eventpoll.c
> ===================================================================
> --- linux.orig/fs/eventpoll.c
> +++ linux/fs/eventpoll.c
> @@ -1711,7 +1711,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in
>
> ? ? ? ?/* Check if EPOLLWAKEUP is allowed */
> ? ? ? ?if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
> - ? ? ? ? ? ? ? goto error_tgt_fput;
> + ? ? ? ? ? ? ? epds.events &= ~EPOLLWAKEUP;
>
> ? ? ? ?/*
> ? ? ? ? * We have to check that the file structure underneath the file descriptor

Is there any way for the application to detect that it did not get the
EPOLLWAKEUP feature?

--
Arve Hj?nnev?g

2012-05-22 15:51:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] epoll: Fix user space breakage related to EPOLLWAKEUP (was: Re: [-next regression] TCP window full with EPOLLWAKEUP)

On Tuesday, May 22, 2012, Arve Hj?nnev?g wrote:
> On Mon, May 21, 2012 at 12:28 PM, Rafael J. Wysocki <[email protected]> wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
> > suspend while epoll events are ready) caused some applications to
> > malfunction, because they set the bit corresponding to the new
> > EPOLLWAKEUP flag in their eventpoll flags and they don't have the
> > new CAP_EPOLLWAKEUP capability.
> >
> > To prevent that from happening, change epoll_ctl() to clear
> > EPOLLWAKEUP in epds.events if the caller doesn't have the
> > CAP_EPOLLWAKEUP capability instead of failing and returning an
> > error code, which allows the affected applications to function
> > normally.
> >
> > Reported-and-tested-by: Jiri Slaby <[email protected]>
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > fs/eventpoll.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: linux/fs/eventpoll.c
> > ===================================================================
> > --- linux.orig/fs/eventpoll.c
> > +++ linux/fs/eventpoll.c
> > @@ -1711,7 +1711,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in
> >
> > /* Check if EPOLLWAKEUP is allowed */
> > if ((epds.events & EPOLLWAKEUP) && !capable(CAP_EPOLLWAKEUP))
> > - goto error_tgt_fput;
> > + epds.events &= ~EPOLLWAKEUP;
> >
> > /*
> > * We have to check that the file structure underneath the file descriptor
>
> Is there any way for the application to detect that it did not get the
> EPOLLWAKEUP feature?

Well, it should check its capabilities beforehand if it really cares ...

Moreover, if the creation of the wakeup source in ep_modify() fails, the
application won't be notified either.

Thanks,
Rafael