We only require cmpxchg()&retry when task is exiting.
xchg() is enough in other cases like original code in ac3d0da8.
So we try our best to use xchg() and avoid competition&latency
from task_work_add().
Also remove the inner loop
Signed-off-by: Lai Jiangshan <[email protected]>
---
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 65bd3c9..82a42e7 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -56,14 +56,13 @@ void task_work_run(void)
* work->func() can do task_work_add(), do not set
* work_exited unless the list is empty.
*/
- do {
- work = ACCESS_ONCE(task->task_works);
- head = !work && (task->flags & PF_EXITING) ?
- &work_exited : NULL;
- } while (cmpxchg(&task->task_works, work, head) != work);
-
- if (!work)
+ if (!ACCESS_ONCE(task->task_works) ||
+ !(work = xchg(&task->task_works, NULL))) {
+ if ((task->flags & PF_EXITING) &&
+ cmpxchg(&task->task_works, NULL, &work_exited))
+ continue;
break;
+ }
/*
* Synchronize with task_work_cancel(). It can't remove
* the first entry == work, cmpxchg(task_works) should
On 10/08, Lai Jiangshan wrote:
>
> We only require cmpxchg()&retry when task is exiting.
> xchg() is enough in other cases like original code in ac3d0da8.
Yes, we can probably do xchg/cmpxchg depending on NULL/work_exited.
Not sure it makes sense to complicate the code though. Is xchg()
really faster than cmpxchg?
> Also remove the inner loop
Yes, it is not really needed, only for readability.
"do while (!cmpxchg)" can be replaced with "if (!cmpxchg) continue".
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -56,14 +56,13 @@ void task_work_run(void)
> * work->func() can do task_work_add(), do not set
> * work_exited unless the list is empty.
> */
> - do {
> - work = ACCESS_ONCE(task->task_works);
> - head = !work && (task->flags & PF_EXITING) ?
> - &work_exited : NULL;
> - } while (cmpxchg(&task->task_works, work, head) != work);
> -
> - if (!work)
> + if (!ACCESS_ONCE(task->task_works) ||
ACCESS_ONCE() looks confusing. It is not needed with this patch.
> + !(work = xchg(&task->task_works, NULL))) {
> + if ((task->flags & PF_EXITING) &&
> + cmpxchg(&task->task_works, NULL, &work_exited))
> + continue;
> break;
> + }
I think the patch is correct.
But the code looks more complex, and the only advantage is that
non-exiting task does xchg() instead of cmpxchg(). Not sure this
worth the trouble, in this case task_work_run() will likey run
the callbacks (the caller checks ->task_works != NULL), I do not
think this can add any noticeable speedup.
But, as for correctness,
Reviewed-by: Oleg Nesterov <[email protected]>
On Mon, 2012-10-08 at 14:38 +0200, Oleg Nesterov wrote:
> But the code looks more complex, and the only advantage is that
> non-exiting task does xchg() instead of cmpxchg(). Not sure this
> worth the trouble, in this case task_work_run() will likey run
> the callbacks (the caller checks ->task_works != NULL), I do not
> think this can add any noticeable speedup.
Yeah, I agree, the patch doesn't seem worth the trouble. It makes tricky
code unreadable at best.
On 10/09/2012 07:04 PM, Peter Zijlstra wrote:
> On Mon, 2012-10-08 at 14:38 +0200, Oleg Nesterov wrote:
>> But the code looks more complex, and the only advantage is that
>> non-exiting task does xchg() instead of cmpxchg(). Not sure this
>> worth the trouble, in this case task_work_run() will likey run
>> the callbacks (the caller checks ->task_works != NULL), I do not
>> think this can add any noticeable speedup.
>
> Yeah, I agree, the patch doesn't seem worth the trouble. It makes tricky
> code unreadable at best.
>
To gain better readability, we need to move work_exited things out
from task_work_run() too.
Thanks,
Lai
Subject: task_work: avoid unneeded cmpxchg() in task_work_run()
We only require cmpxchg()&retry when task is exiting.
xchg() is enough in other cases like original code in ac3d0da8.
So we use xchg() for task_work_run() and move the logic
of exit_task_work() out from task_work_run().
Signed-off-by: Lai Jiangshan <[email protected]>
---
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index ca5a1cf..1e686a5 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -15,10 +15,6 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
int task_work_add(struct task_struct *task, struct callback_head *twork, bool);
struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
void task_work_run(void);
-
-static inline void exit_task_work(struct task_struct *task)
-{
- task_work_run();
-}
+void exit_task_work(struct task_struct *task);
#endif /* _LINUX_TASK_WORK_H */
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 65bd3c9..87ef3b7 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -52,16 +52,7 @@ void task_work_run(void)
struct callback_head *work, *head, *next;
for (;;) {
- /*
- * work->func() can do task_work_add(), do not set
- * work_exited unless the list is empty.
- */
- do {
- work = ACCESS_ONCE(task->task_works);
- head = !work && (task->flags & PF_EXITING) ?
- &work_exited : NULL;
- } while (cmpxchg(&task->task_works, work, head) != work);
-
+ work = xchg(&task->task_works, NULL);
if (!work)
break;
/*
@@ -90,3 +81,17 @@ void task_work_run(void)
} while (work);
}
}
+
+void exit_task_work(struct task_struct *task)
+{
+ for (;;) {
+ /*
+ * work->func() can do task_work_add(), do not set
+ * work_exited unless the list is empty.
+ */
+ if (unlikely(task->task_works))
+ task_work_run();
+ if (cmpxchg(&task->task_works, NULL, &work_exited) == NULL)
+ break;
+ }
+}
On Wed, 2012-10-10 at 19:50 +0200, Oleg Nesterov wrote:
>
> But you did not answer, and I am curious. What was your original
> motivation? Is xchg really faster than cmpxchg?
And is this true over multiple architectures? Or are we optimizing for
x86_64 (again) ?
On 10/10, Lai Jiangshan wrote:
>
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -52,16 +52,7 @@ void task_work_run(void)
> struct callback_head *work, *head, *next;
>
> for (;;) {
> - /*
> - * work->func() can do task_work_add(), do not set
> - * work_exited unless the list is empty.
> - */
> - do {
> - work = ACCESS_ONCE(task->task_works);
> - head = !work && (task->flags & PF_EXITING) ?
> - &work_exited : NULL;
> - } while (cmpxchg(&task->task_works, work, head) != work);
> -
> + work = xchg(&task->task_works, NULL);
> if (!work)
> break;
> /*
> @@ -90,3 +81,17 @@ void task_work_run(void)
> } while (work);
> }
> }
> +
> +void exit_task_work(struct task_struct *task)
> +{
> + for (;;) {
> + /*
> + * work->func() can do task_work_add(), do not set
> + * work_exited unless the list is empty.
> + */
> + if (unlikely(task->task_works))
> + task_work_run();
> + if (cmpxchg(&task->task_works, NULL, &work_exited) == NULL)
> + break;
> + }
> +}
I agree, this looks fine.
But if you add "unlikely" before task_work_run(), then probably
it should do
while (cmpxchg(&task->task_works, NULL, work_exited))
task_work_run();
? it looks more simple/clean.
(OTOH I am not sure "unlikely" is true, note that exit_files() will
offload ____fput() to task_work_run()).
But you did not answer, and I am curious. What was your original
motivation? Is xchg really faster than cmpxchg?
Oleg.