2006-01-05 18:01:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

Ravikiran G Thirumalai wrote:
>
> + need_lock = (p == current && thread_group_empty(p));

This should be

need_lock = !(p == current && thread_group_empty(p));


I think we should cleanup k_getrusage() first, see the patch below.

Do you agree with that patch? If yes, could you make the new one on
top of it? (please send them both to Andrew).

Note that after this cleanup we don't have code duplication.


[PATCH] simplify k_getrusage()

Factor out common code for different RUSAGE_xxx cases.

Don't take ->sighand->siglock in RUSAGE_SELF case, suggested
by Ravikiran G Thirumalai.

Signed-off-by: Oleg Nesterov <[email protected]>

--- K/kernel/sys.c~ 2006-01-03 21:15:58.000000000 +0300
+++ K/kernel/sys.c 2006-01-06 00:40:34.000000000 +0300
@@ -1687,7 +1687,10 @@ static void k_getrusage(struct task_stru
if (unlikely(!p->signal))
return;

+ utime = stime = cputime_zero;
+
switch (who) {
+ case RUSAGE_BOTH:
case RUSAGE_CHILDREN:
spin_lock_irqsave(&p->sighand->siglock, flags);
utime = p->signal->cutime;
@@ -1697,22 +1700,11 @@ static void k_getrusage(struct task_stru
r->ru_minflt = p->signal->cmin_flt;
r->ru_majflt = p->signal->cmaj_flt;
spin_unlock_irqrestore(&p->sighand->siglock, flags);
- cputime_to_timeval(utime, &r->ru_utime);
- cputime_to_timeval(stime, &r->ru_stime);
- break;
+
+ if (who == RUSAGE_CHILDREN)
+ break;
+
case RUSAGE_SELF:
- spin_lock_irqsave(&p->sighand->siglock, flags);
- utime = stime = cputime_zero;
- goto sum_group;
- case RUSAGE_BOTH:
- spin_lock_irqsave(&p->sighand->siglock, flags);
- utime = p->signal->cutime;
- stime = p->signal->cstime;
- r->ru_nvcsw = p->signal->cnvcsw;
- r->ru_nivcsw = p->signal->cnivcsw;
- r->ru_minflt = p->signal->cmin_flt;
- r->ru_majflt = p->signal->cmaj_flt;
- sum_group:
utime = cputime_add(utime, p->signal->utime);
stime = cputime_add(stime, p->signal->stime);
r->ru_nvcsw += p->signal->nvcsw;
@@ -1729,13 +1721,14 @@ static void k_getrusage(struct task_stru
r->ru_majflt += t->maj_flt;
t = next_thread(t);
} while (t != p);
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
- cputime_to_timeval(utime, &r->ru_utime);
- cputime_to_timeval(stime, &r->ru_stime);
break;
+
default:
BUG();
}
+
+ cputime_to_timeval(utime, &r->ru_utime);
+ cputime_to_timeval(stime, &r->ru_stime);
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)


2006-01-06 09:46:22

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

On Thu, Jan 05, 2006 at 10:17:01PM +0300, Oleg Nesterov wrote:
> Ravikiran G Thirumalai wrote:
>
>
> I think we should cleanup k_getrusage() first, see the patch below.
>
> Do you agree with that patch? If yes, could you make the new one on
> top of it? (please send them both to Andrew).
>
> Note that after this cleanup we don't have code duplication.

Yes, nice patch. Thanks. Here is the tasklist lock optimization on top of
the simplify-k_getrusage.patch. Andrew, please apply.

Thanks,
Kiran


Following patch avoids taking the global tasklist_lock when possible,
if a process is single threaded during getrusage(). Any avoidance of
tasklist_lock is good for NUMA boxes (and possibly for large SMPs).
Thanks to Oleg Nesterov for review and suggestions.

Signed-off-by: Nippun Goel <[email protected]>
Signed-off-by: Ravikiran Thirumalai <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>

Index: linux-2.6.15-delme/kernel/sys.c
===================================================================
--- linux-2.6.15-delme.orig/kernel/sys.c 2006-01-05 15:40:38.000000000 -0800
+++ linux-2.6.15-delme/kernel/sys.c 2006-01-06 00:43:08.000000000 -0800
@@ -1664,9 +1664,6 @@ asmlinkage long sys_setrlimit(unsigned i
* a lot simpler! (Which we're not doing right now because we're not
* measuring them yet).
*
- * This expects to be called with tasklist_lock read-locked or better,
- * and the siglock not locked. It may momentarily take the siglock.
- *
* When sampling multiple threads for RUSAGE_SELF, under SMP we might have
* races with threads incrementing their own counters. But since word
* reads are atomic, we either get new values or old values and we don't
@@ -1674,6 +1671,25 @@ asmlinkage long sys_setrlimit(unsigned i
* the c* fields from p->signal from races with exit.c updating those
* fields when reaping, so a sample either gets all the additions of a
* given child after it's reaped, or none so this sample is before reaping.
+ *
+ * tasklist_lock locking optimisation:
+ * If we are current and single threaded, we do not need to take the tasklist
+ * lock or the siglock. No one else can take our signal_struct away,
+ * no one else can reap the children to update signal->c* counters, and
+ * no one else can race with the signal-> fields.
+ * If we do not take the tasklist_lock, the signal-> fields could be read
+ * out of order while another thread was just exiting. So we place a
+ * read memory barrier when we avoid the lock. On the writer side,
+ * write memory barrier is implied in __exit_signal as __exit_signal releases
+ * the siglock spinlock after updating the signal-> fields.
+ *
+ * We don't really need the siglock when we access the non c* fields
+ * of the signal_struct (for RUSAGE_SELF) even in multithreaded
+ * case, since we take the tasklist lock for read and the non c* signal->
+ * fields are updated only in __exit_signal, which is called with
+ * tasklist_lock taken for write, hence these two threads cannot execute
+ * concurrently.
+ *
*/

static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
@@ -1681,14 +1697,22 @@ static void k_getrusage(struct task_stru
struct task_struct *t;
unsigned long flags;
cputime_t utime, stime;
+ int need_lock = 0;

memset((char *) r, 0, sizeof *r);
-
- if (unlikely(!p->signal))
- return;
-
utime = stime = cputime_zero;

+ need_lock = !(p == current && thread_group_empty(p));
+ if (need_lock) {
+ read_lock(&tasklist_lock);
+ if (unlikely(!p->signal)) {
+ read_unlock(&tasklist_lock);
+ return;
+ }
+ } else
+ /* See locking comments above */
+ smp_rmb();
+
switch (who) {
case RUSAGE_BOTH:
case RUSAGE_CHILDREN:
@@ -1727,6 +1751,8 @@ static void k_getrusage(struct task_stru
BUG();
}

+ if (need_lock)
+ read_unlock(&tasklist_lock);
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
}
@@ -1734,9 +1760,7 @@ static void k_getrusage(struct task_stru
int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
{
struct rusage r;
- read_lock(&tasklist_lock);
k_getrusage(p, who, &r);
- read_unlock(&tasklist_lock);
return copy_to_user(ru, &r, sizeof(r)) ? -EFAULT : 0;
}

2006-01-06 17:23:49

by Christoph Lameter

[permalink] [raw]
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

On Fri, 6 Jan 2006, Ravikiran G Thirumalai wrote:

> + need_lock = !(p == current && thread_group_empty(p));

Isnt

need_lock = (p != current || !thread_group_empty(b))

clearer?

2006-01-06 19:46:26

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

On Fri, Jan 06, 2006 at 09:23:30AM -0800, Christoph Lameter wrote:
> On Fri, 6 Jan 2006, Ravikiran G Thirumalai wrote:
>
> > + need_lock = !(p == current && thread_group_empty(p));
>
> Isnt
>
> need_lock = (p != current || !thread_group_empty(b))
>
> clearer?

All the same I felt, and the comments were bold and clear. Also, the above was
c translation of what was said in the comment :)...I am OK either ways.
So Here goes...

Following patch avoids taking the global tasklist_lock when possible,
if a process is single threaded during getrusage(). Any avoidance of
tasklist_lock is good for NUMA boxes (and possibly for large SMPs).
Thanks to Oleg Nesterov for review and suggestions.

Signed-off-by: Nippun Goel <[email protected]>
Signed-off-by: Ravikiran Thirumalai <[email protected]>
Signed-off-by: Shai Fultheim <[email protected]>

Index: linux-2.6.15-delme/kernel/sys.c
===================================================================
--- linux-2.6.15-delme.orig/kernel/sys.c 2006-01-05 15:40:38.000000000 -0800
+++ linux-2.6.15-delme/kernel/sys.c 2006-01-06 00:43:08.000000000 -0800
@@ -1664,9 +1664,6 @@ asmlinkage long sys_setrlimit(unsigned i
* a lot simpler! (Which we're not doing right now because we're not
* measuring them yet).
*
- * This expects to be called with tasklist_lock read-locked or better,
- * and the siglock not locked. It may momentarily take the siglock.
- *
* When sampling multiple threads for RUSAGE_SELF, under SMP we might have
* races with threads incrementing their own counters. But since word
* reads are atomic, we either get new values or old values and we don't
@@ -1674,6 +1671,25 @@ asmlinkage long sys_setrlimit(unsigned i
* the c* fields from p->signal from races with exit.c updating those
* fields when reaping, so a sample either gets all the additions of a
* given child after it's reaped, or none so this sample is before reaping.
+ *
+ * tasklist_lock locking optimisation:
+ * If we are current and single threaded, we do not need to take the tasklist
+ * lock or the siglock. No one else can take our signal_struct away,
+ * no one else can reap the children to update signal->c* counters, and
+ * no one else can race with the signal-> fields.
+ * If we do not take the tasklist_lock, the signal-> fields could be read
+ * out of order while another thread was just exiting. So we place a
+ * read memory barrier when we avoid the lock. On the writer side,
+ * write memory barrier is implied in __exit_signal as __exit_signal releases
+ * the siglock spinlock after updating the signal-> fields.
+ *
+ * We don't really need the siglock when we access the non c* fields
+ * of the signal_struct (for RUSAGE_SELF) even in multithreaded
+ * case, since we take the tasklist lock for read and the non c* signal->
+ * fields are updated only in __exit_signal, which is called with
+ * tasklist_lock taken for write, hence these two threads cannot execute
+ * concurrently.
+ *
*/

static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
@@ -1681,14 +1697,22 @@ static void k_getrusage(struct task_stru
struct task_struct *t;
unsigned long flags;
cputime_t utime, stime;
+ int need_lock = 0;

memset((char *) r, 0, sizeof *r);
-
- if (unlikely(!p->signal))
- return;
-
utime = stime = cputime_zero;

+ need_lock = (p != current || !thread_group_empty(p));
+ if (need_lock) {
+ read_lock(&tasklist_lock);
+ if (unlikely(!p->signal)) {
+ read_unlock(&tasklist_lock);
+ return;
+ }
+ } else
+ /* See locking comments above */
+ smp_rmb();
+
switch (who) {
case RUSAGE_BOTH:
case RUSAGE_CHILDREN:
@@ -1727,6 +1751,8 @@ static void k_getrusage(struct task_stru
BUG();
}

+ if (need_lock)
+ read_unlock(&tasklist_lock);
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
}
@@ -1734,9 +1760,7 @@ static void k_getrusage(struct task_stru
int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
{
struct rusage r;
- read_lock(&tasklist_lock);
k_getrusage(p, who, &r);
- read_unlock(&tasklist_lock);
return copy_to_user(ru, &r, sizeof(r)) ? -EFAULT : 0;
}

2006-01-06 23:50:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

Christoph Lameter <[email protected]> wrote:
>
> On Fri, 6 Jan 2006, Ravikiran G Thirumalai wrote:
>
> > + need_lock = !(p == current && thread_group_empty(p));
>
> Isnt
>
> need_lock = (p != current || !thread_group_empty(b))
>
> clearer?

I was actually going to change it to

if (p != current || !thread_group_empty(p))
need_lock = 1;

a) because my brain works that way and

b) To make the currently-unneeded initialisation of need_lock do
something useful ;)