2006-01-09 17:39:40

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

Ravikiran G Thirumalai wrote:
>
> On Sun, Jan 08, 2006 at 02:49:31PM +0300, Oleg Nesterov wrote:
> > > + } else
> > > + /* See locking comments above */
> > > + smp_rmb();
> >
> > This patch doesn't try to optimize ->sighand.siglock locking,
> > and I think this is right. But this also means we don't need
> > rmb() here. It was needed to protect against "another thread
> > just exited, cpu can read ->c* values before thread_group_empty()
> > without taking siglock" case, now it is not possible.
>
> Don't we still need rmb for the RUSAGE_SELF case? we do not take the
> siglock for rusage self and the non c* signal fields are written to
> at __exit_signal...

I think it is unneeded because RUSAGE_SELF case is "racy" anyway even
if we held both locks, task_struct->xxx counters can change at any
moment.

But may be you are right.

> What is wrong with optimizing by not taking the siglock in RUSAGE_BOTH
> and RUSAGE_CHILDREN? I would like to add that in too unless I am
> missing something and the optimization is incorrect.

We can't have contention on ->siglock when need_lock == 0, so why should
we optimize this case?

Oleg.


2006-01-09 20:54:53

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

On Mon, Jan 09, 2006 at 09:55:51PM +0300, Oleg Nesterov wrote:
> Ravikiran G Thirumalai wrote:
> >
> >
> > Don't we still need rmb for the RUSAGE_SELF case? we do not take the
> > siglock for rusage self and the non c* signal fields are written to
> > at __exit_signal...
>
> I think it is unneeded because RUSAGE_SELF case is "racy" anyway even
> if we held both locks, task_struct->xxx counters can change at any
> moment.
>
> But may be you are right.

Hmm...access to task_struct->xxx has been racy, but accessing the
signal->* counters were not. What if read of the signal->utime was a
hoisted read and signal->stime was a read after the counter is updated?
This was not a possibility earlier no?

>
> > What is wrong with optimizing by not taking the siglock in RUSAGE_BOTH
> > and RUSAGE_CHILDREN? I would like to add that in too unless I am
> > missing something and the optimization is incorrect.
>
> We can't have contention on ->siglock when need_lock == 0, so why should
> we optimize this case?

We would be saving 1 buslocked operation in that case (on some arches), a
cacheline fetch for exclusive (since signal and sighand are on different memory
locations), and disabling/enabling onchip interrupts. But yes, this would be a
smaller optimization....Unless you have strong objections this can also
go in?

Thanks,
Kiran

Signed-off-by: Ravikiran Thirumalai <[email protected]>

Index: linux-2.6.15-mm2/kernel/sys.c
===================================================================
--- linux-2.6.15-mm2.orig/kernel/sys.c 2006-01-09 12:44:30.000000000 -0800
+++ linux-2.6.15-mm2/kernel/sys.c 2006-01-09 12:45:07.000000000 -0800
@@ -1730,14 +1730,16 @@ static void k_getrusage(struct task_stru
switch (who) {
case RUSAGE_BOTH:
case RUSAGE_CHILDREN:
- spin_lock_irqsave(&p->sighand->siglock, flags);
+ if (need_lock)
+ spin_lock_irqsave(&p->sighand->siglock, flags);
utime = p->signal->cutime;
stime = p->signal->cstime;
r->ru_nvcsw = p->signal->cnvcsw;
r->ru_nivcsw = p->signal->cnivcsw;
r->ru_minflt = p->signal->cmin_flt;
r->ru_majflt = p->signal->cmaj_flt;
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ if (need_lock)
+ spin_unlock_irqrestore(&p->sighand->siglock, flags);

if (who == RUSAGE_CHILDREN)
break;