2020-05-02 03:14:42

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH] kernel/sys: do not use tasklist_lock to set/get scheduling priorities

For both setpriority(2) and getpriority(2) there's really no need
to be taking the tasklist_lock at all - for which both share it
for the entirety of the syscall. The tasklist_lock does not protect
reading/writing the p->static_prio and task lookups are already rcu
safe, providing a stable pointer.

The following raw microbenchmark improvements on a 40-core box
were seen running the stressng-get workload, which pathologically
pounds on various syscalls that get information from the kernel.
Increasing thread counts of course shows more wins, albeit probably
not something that would be seen in a real workload.

5.7.0-rc3 5.7.0-rc3
getpriority-v1
Hmean get-1 3443.65 ( 0.00%) 3314.08 * -3.76%*
Hmean get-2 7809.99 ( 0.00%) 8547.60 * 9.44%*
Hmean get-4 15498.01 ( 0.00%) 17396.85 * 12.25%*
Hmean get-8 28001.37 ( 0.00%) 31137.53 * 11.20%*
Hmean get-16 31460.88 ( 0.00%) 40284.35 * 28.05%*
Hmean get-32 30036.64 ( 0.00%) 40657.88 * 35.36%*
Hmean get-64 31429.86 ( 0.00%) 41021.73 * 30.52%*
Hmean get-80 31804.13 ( 0.00%) 39188.55 * 23.22%*

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/sys.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index d325f3ab624a..12ade1a00a18 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -214,7 +214,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
niceval = MAX_NICE;

rcu_read_lock();
- read_lock(&tasklist_lock);
switch (which) {
case PRIO_PROCESS:
if (who)
@@ -252,7 +251,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
break;
}
out_unlock:
- read_unlock(&tasklist_lock);
rcu_read_unlock();
out:
return error;
@@ -277,7 +275,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
return -EINVAL;

rcu_read_lock();
- read_lock(&tasklist_lock);
switch (which) {
case PRIO_PROCESS:
if (who)
@@ -323,7 +320,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
break;
}
out_unlock:
- read_unlock(&tasklist_lock);
rcu_read_unlock();

return retval;
--
2.16.4


2020-05-02 09:33:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] kernel/sys: do not use tasklist_lock to set/get scheduling priorities

On Fri, May 01, 2020 at 08:05:39PM -0700, Davidlohr Bueso wrote:
> For both setpriority(2) and getpriority(2) there's really no need
> to be taking the tasklist_lock at all - for which both share it
> for the entirety of the syscall. The tasklist_lock does not protect
> reading/writing the p->static_prio and task lookups are already rcu
> safe, providing a stable pointer.

RCU-safe, as in, it will not crash.. However, without tasklist_lock the
thread iterations (for PRIO_PGRP/PRIO_USER) now race against fork().

That is a user observable change in behaviour.

Do we care about it? No idea, and your Changelog also doesn't provide
clue.

2020-05-03 20:50:00

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] kernel/sys: do not use tasklist_lock to set/get scheduling priorities

Cc'ing Oleg who iirc also like this stuff.

On Sat, 02 May 2020, Peter Zijlstra wrote:

>On Fri, May 01, 2020 at 08:05:39PM -0700, Davidlohr Bueso wrote:
>> For both setpriority(2) and getpriority(2) there's really no need
>> to be taking the tasklist_lock at all - for which both share it
>> for the entirety of the syscall. The tasklist_lock does not protect
>> reading/writing the p->static_prio and task lookups are already rcu
>> safe, providing a stable pointer.
>
>RCU-safe, as in, it will not crash.. However, without tasklist_lock the
>thread iterations (for PRIO_PGRP/PRIO_USER) now race against fork().
>
>That is a user observable change in behaviour.
>
>Do we care about it? No idea, and your Changelog also doesn't provide
>clue.

Yeah, that was convenient of me to leave out, sorry. So copy_process()
will hlist_add_rcu() under the writer tasklist_lock, but pid->tasks rculist
traversals are safe. As such afaiu this fork serialization is for concurrent
changes, something these syscalls do not do.

In any case, we could at least keep the changes to getpriority(2) as even
if there is a race in the list the new priority won't be any higher than
what was observed already, thus maintaining semantics.

Thanks,
Davidlohr