2008-11-05 13:27:32

by Sripathi Kodi

[permalink] [raw]
Subject: [PATCH] Inline double_unlock_balance()

Hi,

We have a test case which measures the variation in the amount of time
needed to perform a fixed amount of work on the preempt_rt kernel. We
started seeing deterioration in it's performance recently. The test
should never take more than 10 microseconds, but we started 5-10%
failure rate. Using elimination method, we traced the problem to commit
1b12bbc747560ea68bcc132c3d05699e52271da0 (lockdep: re-annotate
scheduler runqueues). When LOCKDEP is disabled, this patch only adds an
additional function call to double_unlock_balance(). Hence I inlined
double_unlock_balance() and the problem went away. Here is a patch to
make this change.

Thanks,
Sripathi.

lockdep: Inline double_unlock_balance()

Additional function call for double_unlock_balance() causes latency
problems for some test cases on the preempt_rt kernel.

Signed-off-by: Sripathi Kodi <[email protected]>

Index: linux-2.6.27.4/kernel/sched.c
===================================================================
--- linux-2.6.27.4.orig/kernel/sched.c 2008-11-05 05:01:01.000000000 -0800
+++ linux-2.6.27.4/kernel/sched.c 2008-11-05 05:01:20.000000000 -0800
@@ -2812,7 +2812,7 @@
return ret;
}

-static void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
+static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
__releases(busiest->lock)
{
spin_unlock(&busiest->lock);


2008-11-05 13:59:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] Inline double_unlock_balance()

On Wed, 2008-11-05 at 18:57 +0530, Sripathi Kodi wrote:
> Hi,
>
> We have a test case which measures the variation in the amount of time
> needed to perform a fixed amount of work on the preempt_rt kernel. We
> started seeing deterioration in it's performance recently. The test
> should never take more than 10 microseconds, but we started 5-10%
> failure rate. Using elimination method, we traced the problem to commit
> 1b12bbc747560ea68bcc132c3d05699e52271da0 (lockdep: re-annotate
> scheduler runqueues). When LOCKDEP is disabled, this patch only adds an
> additional function call to double_unlock_balance(). Hence I inlined
> double_unlock_balance() and the problem went away. Here is a patch to
> make this change.
>
> Thanks,
> Sripathi.
>
> lockdep: Inline double_unlock_balance()
>
> Additional function call for double_unlock_balance() causes latency
> problems for some test cases on the preempt_rt kernel.
>
> Signed-off-by: Sripathi Kodi <[email protected]>

Acked-by; Peter Zijlstra <[email protected]>

> Index: linux-2.6.27.4/kernel/sched.c
> ===================================================================
> --- linux-2.6.27.4.orig/kernel/sched.c 2008-11-05 05:01:01.000000000 -0800
> +++ linux-2.6.27.4/kernel/sched.c 2008-11-05 05:01:20.000000000 -0800
> @@ -2812,7 +2812,7 @@
> return ret;
> }
>
> -static void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
> +static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
> __releases(busiest->lock)
> {
> spin_unlock(&busiest->lock);

2008-11-06 07:32:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Inline double_unlock_balance()


* Peter Zijlstra <[email protected]> wrote:

> On Wed, 2008-11-05 at 18:57 +0530, Sripathi Kodi wrote:
> > Hi,
> >
> > We have a test case which measures the variation in the amount of time
> > needed to perform a fixed amount of work on the preempt_rt kernel. We
> > started seeing deterioration in it's performance recently. The test
> > should never take more than 10 microseconds, but we started 5-10%
> > failure rate. Using elimination method, we traced the problem to commit
> > 1b12bbc747560ea68bcc132c3d05699e52271da0 (lockdep: re-annotate
> > scheduler runqueues). When LOCKDEP is disabled, this patch only adds an
> > additional function call to double_unlock_balance(). Hence I inlined
> > double_unlock_balance() and the problem went away. Here is a patch to
> > make this change.
> >
> > Thanks,
> > Sripathi.
> >
> > lockdep: Inline double_unlock_balance()
> >
> > Additional function call for double_unlock_balance() causes latency
> > problems for some test cases on the preempt_rt kernel.
> >
> > Signed-off-by: Sripathi Kodi <[email protected]>
>
> Acked-by; Peter Zijlstra <[email protected]>

hm, i'm not sure why it makes such a difference. Possibly cache
alignment or code generation details pushing the critical path just
beyond the L1 cache limit and causing thrashing?

Anyway, i've applied it to tip/sched/rt, as we generally want to
inline such short locking ops.

Ingo

2008-11-06 07:53:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] Inline double_unlock_balance()

On Thu, 2008-11-06 at 08:32 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Wed, 2008-11-05 at 18:57 +0530, Sripathi Kodi wrote:
> > > Hi,
> > >
> > > We have a test case which measures the variation in the amount of time
> > > needed to perform a fixed amount of work on the preempt_rt kernel. We
> > > started seeing deterioration in it's performance recently. The test
> > > should never take more than 10 microseconds, but we started 5-10%
> > > failure rate. Using elimination method, we traced the problem to commit
> > > 1b12bbc747560ea68bcc132c3d05699e52271da0 (lockdep: re-annotate
> > > scheduler runqueues). When LOCKDEP is disabled, this patch only adds an
> > > additional function call to double_unlock_balance(). Hence I inlined
> > > double_unlock_balance() and the problem went away. Here is a patch to
> > > make this change.
> > >
> > > Thanks,
> > > Sripathi.
> > >
> > > lockdep: Inline double_unlock_balance()
> > >
> > > Additional function call for double_unlock_balance() causes latency
> > > problems for some test cases on the preempt_rt kernel.
> > >
> > > Signed-off-by: Sripathi Kodi <[email protected]>
> >
> > Acked-by; Peter Zijlstra <[email protected]>
>
> hm, i'm not sure why it makes such a difference. Possibly cache
> alignment or code generation details pushing the critical path just
> beyond the L1 cache limit and causing thrashing?
>
> Anyway, i've applied it to tip/sched/rt, as we generally want to
> inline such short locking ops.

I'm thinking sripathi's gcc had a massive brainfart and did something
funny, maybe the extra register pressure from the calling convention
messed it up.

He failed to quantify the exact benefit, ie scheduling cost/latency
before and after and what platform. But still the patch is simple
enough.

2008-11-06 17:38:49

by Dhaval Giani

[permalink] [raw]
Subject: Re: [PATCH] Inline double_unlock_balance()

Needs this to fix a warning as well


sched: fix double_unlock_balance compile warning

kernel/sched_rt.c:913: warning: ‘double_unlock_balance’ declared inline after being called
kernel/sched_rt.c:913: warning: previous declaration of ‘double_unlock_balance’ was here

Correct the declaration.

Signed-off-by: Dhaval Giani <[email protected]>

---
kernel/sched_rt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6-tip/kernel/sched_rt.c
===================================================================
--- linux-2.6-tip.orig/kernel/sched_rt.c 2008-11-06 22:52:23.000000000 +0530
+++ linux-2.6-tip/kernel/sched_rt.c 2008-11-06 22:57:39.000000000 +0530
@@ -910,7 +910,8 @@ static void put_prev_task_rt(struct rq *
#define RT_MAX_TRIES 3

static int double_lock_balance(struct rq *this_rq, struct rq *busiest);
-static void double_unlock_balance(struct rq *this_rq, struct rq *busiest);
+static inline void double_unlock_balance(struct rq *this_rq,
+ struct rq *busiest);

static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep);

--
regards,
Dhaval