2024-01-22 17:07:07

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 0/2] getrusage: use sig->stats_lock

Andrew,

this series replaces

getrusage-move-thread_group_cputime_adjusted-outside-of-lock_task_sighand.patch
getrusage-use-sig-stats_lock.patch

in mm-nonmm-unstable.

The patches are same, I only tried to improve the changelogs as we discussed.

Dylan, do you have something to add?

Oleg.



2024-01-22 17:07:15

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 1/2] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()

thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.

This is also preparation for the next patch which changes getrusage() to
use stats_lock instead of siglock, thread_group_cputime() takes the same
lock. With the current implementation recursive read_seqbegin_or_lock()
is fine, thread_group_cputime() can't enter the slow mode if the caller
holds stats_lock, yet this looks more safe and better performance-wise.

Reported-and-tested-by: Dylan Hatch <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/sys.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index e219fcfa112d..70ad06ad852e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1785,17 +1785,19 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
struct task_struct *t;
unsigned long flags;
u64 tgutime, tgstime, utime, stime;
- unsigned long maxrss = 0;
+ unsigned long maxrss;
+ struct mm_struct *mm;
struct signal_struct *sig = p->signal;

- memset((char *)r, 0, sizeof (*r));
+ memset(r, 0, sizeof(*r));
utime = stime = 0;
+ maxrss = 0;

if (who == RUSAGE_THREAD) {
task_cputime_adjusted(current, &utime, &stime);
accumulate_thread_rusage(p, r);
maxrss = sig->maxrss;
- goto out;
+ goto out_thread;
}

if (!lock_task_sighand(p, &flags))
@@ -1819,9 +1821,6 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
fallthrough;

case RUSAGE_SELF:
- thread_group_cputime_adjusted(p, &tgutime, &tgstime);
- utime += tgutime;
- stime += tgstime;
r->ru_nvcsw += sig->nvcsw;
r->ru_nivcsw += sig->nivcsw;
r->ru_minflt += sig->min_flt;
@@ -1839,19 +1838,24 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
}
unlock_task_sighand(p, &flags);

-out:
- r->ru_utime = ns_to_kernel_old_timeval(utime);
- r->ru_stime = ns_to_kernel_old_timeval(stime);
+ if (who == RUSAGE_CHILDREN)
+ goto out_children;

- if (who != RUSAGE_CHILDREN) {
- struct mm_struct *mm = get_task_mm(p);
+ thread_group_cputime_adjusted(p, &tgutime, &tgstime);
+ utime += tgutime;
+ stime += tgstime;

- if (mm) {
- setmax_mm_hiwater_rss(&maxrss, mm);
- mmput(mm);
- }
+out_thread:
+ mm = get_task_mm(p);
+ if (mm) {
+ setmax_mm_hiwater_rss(&maxrss, mm);
+ mmput(mm);
}
+
+out_children:
r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
+ r->ru_utime = ns_to_kernel_old_timeval(utime);
+ r->ru_stime = ns_to_kernel_old_timeval(stime);
}

SYSCALL_DEFINE2(getrusage, int, who, struct rusage __user *, ru)
--
2.25.1.362.g51ebf55


2024-01-22 17:07:57

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 2/2] getrusage: use sig->stats_lock rather than lock_task_sighand()

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change getrusage() to use sig->stats_lock, it was specifically designed
for this type of use. This way it runs lockless in the likely case.

TODO:
- Change do_task_stat() to use sig->stats_lock too, then we can
remove spin_lock_irq(siglock) in wait_task_zombie().

- Turn sig->stats_lock into seqcount_rwlock_t, this way the
readers in the slow mode won't exclude each other. See
https://lore.kernel.org/all/[email protected]/

- stats_lock has to disable irqs because ->siglock can be taken
in irq context, it would be very nice to change __exit_signal()
to avoid the siglock->stats_lock dependency.

Reported-and-tested-by: Dylan Hatch <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/sys.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 70ad06ad852e..f8e543f1e38a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1788,7 +1788,9 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long maxrss;
struct mm_struct *mm;
struct signal_struct *sig = p->signal;
+ unsigned int seq = 0;

+retry:
memset(r, 0, sizeof(*r));
utime = stime = 0;
maxrss = 0;
@@ -1800,8 +1802,7 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
goto out_thread;
}

- if (!lock_task_sighand(p, &flags))
- return;
+ flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);

switch (who) {
case RUSAGE_BOTH:
@@ -1829,14 +1830,23 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_oublock += sig->oublock;
if (maxrss < sig->maxrss)
maxrss = sig->maxrss;
+
+ rcu_read_lock();
__for_each_thread(sig, t)
accumulate_thread_rusage(t, r);
+ rcu_read_unlock();
+
break;

default:
BUG();
}
- unlock_task_sighand(p, &flags);
+
+ if (need_seqretry(&sig->stats_lock, seq)) {
+ seq = 1;
+ goto retry;
+ }
+ done_seqretry_irqrestore(&sig->stats_lock, seq, flags);

if (who == RUSAGE_CHILDREN)
goto out_children;
--
2.25.1.362.g51ebf55


2024-01-23 00:26:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] getrusage: use sig->stats_lock rather than lock_task_sighand()

On Mon, 22 Jan 2024 16:50:53 +0100 Oleg Nesterov <[email protected]> wrote:

> lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
> getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
> will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

It would be super interesting to see Dylan's original report.

Is it possible for carefully-crafted unprivileged userspace to
deliberately trigger this?

2024-01-23 15:59:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] getrusage: use sig->stats_lock rather than lock_task_sighand()

On 01/22, Andrew Morton wrote:
>
> On Mon, 22 Jan 2024 16:50:53 +0100 Oleg Nesterov <[email protected]> wrote:
>
> > lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
> > getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
> > will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
>
> It would be super interesting to see Dylan's original report.

from "[RFC PATCH] getrusage: Use trylock when getting sighand lock."
https://lore.kernel.org/all/[email protected]/

Processes with many threads run the risk of causing a hard lockup if
too many threads are calling getrusage() at once. This is because a
calling thread with RUSAGE_SELF spins on the sighand lock with irq
disabled, and the critical section of getrusage scales linearly with the
size of the process. All cpus may end up spinning on the sighand lock
for a long time because another thread has the lock and is busy
iterating over 250k+ threads.

> Is it possible for carefully-crafted unprivileged userspace to
> deliberately trigger this?

Yes, just you need to create a process with a lot of threads calling
getrusage().

See mine and Dylan's test-cases in
https://lore.kernel.org/all/CADBMgpz7k=LhktfcJhSDBDWN0oLeQxPqhOVws3fq0LNpnfOSYg@mail.gmail.com/
There are very similar and simple.


And again, this is a known problem and we need more fixes.

Oleg.


2024-01-23 23:45:55

by Dylan Hatch

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] getrusage: use sig->stats_lock

On Mon, Jan 22, 2024 at 7:51 AM Oleg Nesterov <[email protected]> wrote:
>
> Dylan, do you have something to add?
>
> Oleg.
>

I have one last question -- is there possibly an edge case in which
the hard lockup
can still happen? How likely is it for many writers to force enough
readers to do a
retry on the seqlock, disabling irq and causing the lockup? Is this
not a concern
because there aren't many writers?

Thanks,
Dylan

2024-01-24 00:41:02

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] getrusage: use sig->stats_lock

On 01/23, Dylan Hatch wrote:
>
> I have one last question -- is there possibly an edge case in which
> the hard lockup
> can still happen? How likely is it for many writers to force enough
> readers to do a
> retry on the seqlock, disabling irq and causing the lockup?

I don't know how likely is it, and I guess the repro should be more creative ;)

But yes. Please see the TODO: section in the changelog,

- Turn sig->stats_lock into seqcount_rwlock_t, this way the
readers in the slow mode won't exclude each other.

and more importantly,

- stats_lock has to disable irqs because ->siglock can be taken
in irq context, it would be very nice to change __exit_signal()
to avoid the siglock->stats_lock dependency.

There are other users which take stats_lock under siglock (and the
"fs/proc: do_task_stat" series changes 2 of them to not do this), but
__exit_signal() is most problematic.

If we remove this dependency, we can turn read_seqbegin_or_lock_irqsave()
into read_seqbegin_or_lock() which doesn't disable irqs.

Oleg.