2019-07-23 16:16:44

by Jason Xing

[permalink] [raw]
Subject: [PATCH] psi: get poll_work to run when calling poll syscall next time

Only when calling the poll syscall the first time can user
receive POLLPRI correctly. After that, user always fails to
acquire the event signal.

Reproduce case:
1. Get the monitor code in Documentation/accounting/psi.txt
2. Run it, and wait for the event triggered.
3. Kill and restart the process.

If the user doesn't kill the monitor process, it seems the
poll_work works fine. After killing and restarting the monitor,
the poll_work in kernel will never run again due to the wrong
value of poll_scheduled. Therefore, we should reset the value
as group_init() does after the last trigger is destroyed.

Signed-off-by: Jason Xing <[email protected]>
---
kernel/sched/psi.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 7acc632..66f4385 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1133,6 +1133,12 @@ static void psi_trigger_destroy(struct kref *ref)
if (kworker_to_destroy) {
kthread_cancel_delayed_work_sync(&group->poll_work);
kthread_destroy_worker(kworker_to_destroy);
+ /*
+ * The poll_work should have the chance to be put into the
+ * kthread queue when calling poll syscall next time. So
+ * reset poll_scheduled to zero as group_init() does
+ */
+ atomic_set(&group->poll_scheduled, 0);
}
kfree(t);
}
--
1.8.3.1


2019-07-23 20:03:45

by Caspar Zhang

[permalink] [raw]
Subject: Re: [PATCH] psi: get poll_work to run when calling poll syscall next time

On Tue, Jul 23, 2019 at 02:45:39PM +0800, Jason Xing wrote:
> Only when calling the poll syscall the first time can user
> receive POLLPRI correctly. After that, user always fails to
> acquire the event signal.
>
> Reproduce case:
> 1. Get the monitor code in Documentation/accounting/psi.txt
> 2. Run it, and wait for the event triggered.
> 3. Kill and restart the process.
>
> If the user doesn't kill the monitor process, it seems the
> poll_work works fine. After killing and restarting the monitor,
> the poll_work in kernel will never run again due to the wrong
> value of poll_scheduled. Therefore, we should reset the value
> as group_init() does after the last trigger is destroyed.
>
> Signed-off-by: Jason Xing <[email protected]>

Reviewed-by: Caspar Zhang <[email protected]>

> ---
> kernel/sched/psi.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 7acc632..66f4385 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -1133,6 +1133,12 @@ static void psi_trigger_destroy(struct kref *ref)
> if (kworker_to_destroy) {
> kthread_cancel_delayed_work_sync(&group->poll_work);
> kthread_destroy_worker(kworker_to_destroy);
> + /*
> + * The poll_work should have the chance to be put into the
> + * kthread queue when calling poll syscall next time. So
> + * reset poll_scheduled to zero as group_init() does
> + */
> + atomic_set(&group->poll_scheduled, 0);
> }
> kfree(t);
> }
> --
> 1.8.3.1
>

--
Thanks,
Caspar

2019-07-29 08:54:16

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH] psi: get poll_work to run when calling poll syscall next time

Hello,

Could someone take a quick look at this patch? It's not complicated at
all, just one line added into PSI which can make the poll() run in the
right way.

Thanks,
Jason

On 2019/7/23 下午2:45, Jason Xing wrote:
> Only when calling the poll syscall the first time can user
> receive POLLPRI correctly. After that, user always fails to
> acquire the event signal.
>
> Reproduce case:
> 1. Get the monitor code in Documentation/accounting/psi.txt
> 2. Run it, and wait for the event triggered.
> 3. Kill and restart the process.
>
> If the user doesn't kill the monitor process, it seems the
> poll_work works fine. After killing and restarting the monitor,
> the poll_work in kernel will never run again due to the wrong
> value of poll_scheduled. Therefore, we should reset the value
> as group_init() does after the last trigger is destroyed.
>
> Signed-off-by: Jason Xing <[email protected]>
> ---
> kernel/sched/psi.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 7acc632..66f4385 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -1133,6 +1133,12 @@ static void psi_trigger_destroy(struct kref *ref)
> if (kworker_to_destroy) {
> kthread_cancel_delayed_work_sync(&group->poll_work);
> kthread_destroy_worker(kworker_to_destroy);
> + /*
> + * The poll_work should have the chance to be put into the
> + * kthread queue when calling poll syscall next time. So
> + * reset poll_scheduled to zero as group_init() does
> + */
> + atomic_set(&group->poll_scheduled, 0);
> }
> kfree(t);
> }
>

2019-07-29 15:33:16

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH] psi: get poll_work to run when calling poll syscall next time

Hi Jason,

On Tue, Jul 23, 2019 at 02:45:39PM +0800, Jason Xing wrote:
> Only when calling the poll syscall the first time can user
> receive POLLPRI correctly. After that, user always fails to
> acquire the event signal.
>
> Reproduce case:
> 1. Get the monitor code in Documentation/accounting/psi.txt
> 2. Run it, and wait for the event triggered.
> 3. Kill and restart the process.
>
> If the user doesn't kill the monitor process, it seems the
> poll_work works fine. After killing and restarting the monitor,
> the poll_work in kernel will never run again due to the wrong
> value of poll_scheduled. Therefore, we should reset the value
> as group_init() does after the last trigger is destroyed.
>
> Signed-off-by: Jason Xing <[email protected]>

Good catch, and the fix makes sense to me. However, it was a bit hard
to understand how the problem occurs:

> ---
> kernel/sched/psi.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 7acc632..66f4385 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -1133,6 +1133,12 @@ static void psi_trigger_destroy(struct kref *ref)
> if (kworker_to_destroy) {
> kthread_cancel_delayed_work_sync(&group->poll_work);
> kthread_destroy_worker(kworker_to_destroy);
> + /*
> + * The poll_work should have the chance to be put into the
> + * kthread queue when calling poll syscall next time. So
> + * reset poll_scheduled to zero as group_init() does
> + */
> + atomic_set(&group->poll_scheduled, 0);

The question is why we can end up with poll_scheduled = 1 but the work
not running (which would reset it to 0). And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker. The
cancel needs to pair with resetting the poll_scheduled flag:

if (kworker_to_destroy) {
/*
* After the RCU grace period has expired, the worker
* can no longer be found through group->poll_kworker.
* But it might have been already scheduled before
* that - deschedule it cleanly before destroying it.
*/
kthread_cancel_delayed_work_sync(&group->poll_work);
atomic_set(&group->poll_scheduled, 0);

kthread_destroy_worker(kworker_to_destroy);
}

With that change, please add:

Acked-by: Johannes Weiner <[email protected]>

Thanks!

2019-07-29 16:47:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH] psi: get poll_work to run when calling poll syscall next time

On Mon, Jul 29, 2019 at 8:30 AM Johannes Weiner <[email protected]> wrote:
>
> Hi Jason,
>
> On Tue, Jul 23, 2019 at 02:45:39PM +0800, Jason Xing wrote:
> > Only when calling the poll syscall the first time can user
> > receive POLLPRI correctly. After that, user always fails to
> > acquire the event signal.
> >
> > Reproduce case:
> > 1. Get the monitor code in Documentation/accounting/psi.txt
> > 2. Run it, and wait for the event triggered.
> > 3. Kill and restart the process.
> >
> > If the user doesn't kill the monitor process, it seems the
> > poll_work works fine. After killing and restarting the monitor,
> > the poll_work in kernel will never run again due to the wrong
> > value of poll_scheduled. Therefore, we should reset the value
> > as group_init() does after the last trigger is destroyed.
> >
> > Signed-off-by: Jason Xing <[email protected]>
>
> Good catch, and the fix makes sense to me. However, it was a bit hard
> to understand how the problem occurs:
>
> > ---
> > kernel/sched/psi.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> > index 7acc632..66f4385 100644
> > --- a/kernel/sched/psi.c
> > +++ b/kernel/sched/psi.c
> > @@ -1133,6 +1133,12 @@ static void psi_trigger_destroy(struct kref *ref)
> > if (kworker_to_destroy) {
> > kthread_cancel_delayed_work_sync(&group->poll_work);
> > kthread_destroy_worker(kworker_to_destroy);
> > + /*
> > + * The poll_work should have the chance to be put into the
> > + * kthread queue when calling poll syscall next time. So
> > + * reset poll_scheduled to zero as group_init() does
> > + */
> > + atomic_set(&group->poll_scheduled, 0);
>
> The question is why we can end up with poll_scheduled = 1 but the work
> not running (which would reset it to 0). And the answer is because the
> scheduling side sees group->poll_kworker under RCU protection and then
> schedules it, but here we cancel the work and destroy the worker. The
> cancel needs to pair with resetting the poll_scheduled flag:
>
> if (kworker_to_destroy) {
> /*
> * After the RCU grace period has expired, the worker
> * can no longer be found through group->poll_kworker.
> * But it might have been already scheduled before
> * that - deschedule it cleanly before destroying it.
> */
> kthread_cancel_delayed_work_sync(&group->poll_work);
> atomic_set(&group->poll_scheduled, 0);
>
> kthread_destroy_worker(kworker_to_destroy);
> }
>
> With that change, please add:
>
> Acked-by: Johannes Weiner <[email protected]>
>
> Thanks!

The changes makes sense to me as well. Thanks!

Reviewed-by: Suren Baghdasaryan <[email protected]>

2019-07-30 10:06:29

by Jason Xing

[permalink] [raw]
Subject: [PATCH v2] psi: get poll_work to run when calling poll syscall next time

Only when calling the poll syscall the first time can user
receive POLLPRI correctly. After that, user always fails to
acquire the event signal.

Reproduce case:
1. Get the monitor code in Documentation/accounting/psi.txt
2. Run it, and wait for the event triggered.
3. Kill and restart the process.

If the user doesn't kill the monitor process, it seems the
poll_work works fine. After killing and restarting the monitor,
the poll_work in kernel will never run again due to the wrong
value of poll_scheduled. Therefore, we should reset the value
as group_init() does after the last trigger is destroyed.

[PATCH V2]
In the patch v2, I put the atomic_set(&group->poll_scheduled, 0);
into the right place.
Here I quoted from Johannes as the best explaination:
"The question is why we can end up with poll_scheduled = 1 but the work
not running (which would reset it to 0). And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker. The
cancel needs to pair with resetting the poll_scheduled flag."

Signed-off-by: Jason Xing <[email protected]>
Reviewed-by: Caspar Zhang <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
---
kernel/sched/psi.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 7acc632..acdada0 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1131,7 +1131,14 @@ static void psi_trigger_destroy(struct kref *ref)
* deadlock while waiting for psi_poll_work to acquire trigger_lock
*/
if (kworker_to_destroy) {
+ /*
+ * After the RCU grace period has expired, the worker
+ * can no longer be found through group->poll_kworker.
+ * But it might have been already scheduled before
+ * that - deschedule it cleanly before destroying it.
+ */
kthread_cancel_delayed_work_sync(&group->poll_work);
+ atomic_set(&group->poll_scheduled, 0);
kthread_destroy_worker(kworker_to_destroy);
}
kfree(t);
--
1.8.3.1

2019-08-02 09:14:15

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH v2] psi: get poll_work to run when calling poll syscall next time

Hi all,

According to the reviews from Johoannes, I've changed the old patch and
then submitted the version 2 patch a few days ago.

Please let me know if all this sounds good, or if there are any issues.

Thanks,
Jason

On 2019/7/30 下午1:16, Jason Xing wrote:
> Only when calling the poll syscall the first time can user
> receive POLLPRI correctly. After that, user always fails to
> acquire the event signal.
>
> Reproduce case:
> 1. Get the monitor code in Documentation/accounting/psi.txt
> 2. Run it, and wait for the event triggered.
> 3. Kill and restart the process.
>
> If the user doesn't kill the monitor process, it seems the
> poll_work works fine. After killing and restarting the monitor,
> the poll_work in kernel will never run again due to the wrong
> value of poll_scheduled. Therefore, we should reset the value
> as group_init() does after the last trigger is destroyed.
>
> [PATCH V2]
> In the patch v2, I put the atomic_set(&group->poll_scheduled, 0);
> into the right place.
> Here I quoted from Johannes as the best explaination:
> "The question is why we can end up with poll_scheduled = 1 but the work
> not running (which would reset it to 0). And the answer is because the
> scheduling side sees group->poll_kworker under RCU protection and then
> schedules it, but here we cancel the work and destroy the worker. The
> cancel needs to pair with resetting the poll_scheduled flag."
>
> Signed-off-by: Jason Xing <[email protected]>
> Reviewed-by: Caspar Zhang <[email protected]>
> Reviewed-by: Joseph Qi <[email protected]>
> Reviewed-by: Suren Baghdasaryan <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> ---
> kernel/sched/psi.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 7acc632..acdada0 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -1131,7 +1131,14 @@ static void psi_trigger_destroy(struct kref *ref)
> * deadlock while waiting for psi_poll_work to acquire trigger_lock
> */
> if (kworker_to_destroy) {
> + /*
> + * After the RCU grace period has expired, the worker
> + * can no longer be found through group->poll_kworker.
> + * But it might have been already scheduled before
> + * that - deschedule it cleanly before destroying it.
> + */
> kthread_cancel_delayed_work_sync(&group->poll_work);
> + atomic_set(&group->poll_scheduled, 0);
> kthread_destroy_worker(kworker_to_destroy);
> }
> kfree(t);
>

2019-08-15 02:21:45

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH v2] psi: get poll_work to run when calling poll syscall next time

Hello,

It's been delayed for no reason a couple of days. Any comments and
suggestions on this patch V2 would be appreciated.

Thanks,
Jason

On 2019/7/30 下午1:16, Jason Xing wrote:
> Only when calling the poll syscall the first time can user
> receive POLLPRI correctly. After that, user always fails to
> acquire the event signal.
>
> Reproduce case:
> 1. Get the monitor code in Documentation/accounting/psi.txt
> 2. Run it, and wait for the event triggered.
> 3. Kill and restart the process.
>
> If the user doesn't kill the monitor process, it seems the
> poll_work works fine. After killing and restarting the monitor,
> the poll_work in kernel will never run again due to the wrong
> value of poll_scheduled. Therefore, we should reset the value
> as group_init() does after the last trigger is destroyed.
>
> [PATCH V2]
> In the patch v2, I put the atomic_set(&group->poll_scheduled, 0);
> into the right place.
> Here I quoted from Johannes as the best explaination:
> "The question is why we can end up with poll_scheduled = 1 but the work
> not running (which would reset it to 0). And the answer is because the
> scheduling side sees group->poll_kworker under RCU protection and then
> schedules it, but here we cancel the work and destroy the worker. The
> cancel needs to pair with resetting the poll_scheduled flag."
>
> Signed-off-by: Jason Xing <[email protected]>
> Reviewed-by: Caspar Zhang <[email protected]>
> Reviewed-by: Joseph Qi <[email protected]>
> Reviewed-by: Suren Baghdasaryan <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> ---
> kernel/sched/psi.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 7acc632..acdada0 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -1131,7 +1131,14 @@ static void psi_trigger_destroy(struct kref *ref)
> * deadlock while waiting for psi_poll_work to acquire trigger_lock
> */
> if (kworker_to_destroy) {
> + /*
> + * After the RCU grace period has expired, the worker
> + * can no longer be found through group->poll_kworker.
> + * But it might have been already scheduled before
> + * that - deschedule it cleanly before destroying it.
> + */
> kthread_cancel_delayed_work_sync(&group->poll_work);
> + atomic_set(&group->poll_scheduled, 0);
> kthread_destroy_worker(kworker_to_destroy);
> }
> kfree(t);
>