2007-05-14 06:27:38

by Tsutomu OWA

[permalink] [raw]
Subject: [patch 0/4] powerpc 2.6.21-rt1: fix build a breakage and some minor issues


Hi Ingo,

Please apply.

This series of patches fixes a build breakage and some minor issues
including oonverting spinlocks to raw ones, adding a need_resched_delayed
check, etc. for powerpc64.

This applies on top of linux-2.6.21 and patch-2.6.21-rt1.

Compile and boot tested for both PREEMPT_NONE and PREEMPT_RT on Celleb.
Compile tested for PREEMPT_DESKTOP.

Thanks in advance.
-- owa
TOSHIBA, Corprate Software Engineering Center.


2007-05-14 06:28:51

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 2/4] powerpc 2.6.21-rt1: convert spinlocks to raw ones for Celleb.


Convert more spinlocks to raw ones for Celleb.

Signed-off-by: Tsutomu OWA <[email protected]>
-- owa

diff -rup linux-2.6.21-rt1/arch/powerpc/platforms/celleb/htab.c rt/arch/powerpc/platforms/celleb/htab.c
--- linux-2.6.21-rt1/arch/powerpc/platforms/celleb/htab.c 2007-04-26 12:08:32.000000000 +0900
+++ rt/arch/powerpc/platforms/celleb/htab.c 2007-05-07 14:05:30.000000000 +0900
@@ -40,7 +40,7 @@
#define DBG_LOW(fmt...) do { } while(0)
#endif

-static DEFINE_SPINLOCK(beat_htab_lock);
+static DEFINE_RAW_SPINLOCK(beat_htab_lock);

static inline unsigned int beat_read_mask(unsigned hpte_group)
{
--- linux-2.6.21-rt1/arch/powerpc/platforms/celleb/interrupt.c 2007-04-26 12:08:32.000000000 +0900
+++ rt/arch/powerpc/platforms/celleb/interrupt.c 2007-05-09 17:02:33.000000000 +0900
@@ -30,7 +30,7 @@
#include "beat_wrapper.h"

#define MAX_IRQS NR_IRQS
-static DEFINE_SPINLOCK(beatic_irq_mask_lock);
+static DEFINE_RAW_SPINLOCK(beatic_irq_mask_lock);
static uint64_t beatic_irq_mask_enable[(MAX_IRQS+255)/64];
static uint64_t beatic_irq_mask_ack[(MAX_IRQS+255)/64];

2007-05-14 06:30:56

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 1/4] powerpc 2.6.21-rt1: fix a build breakage by adding __raw_*_relax() macros


Add missing macros to fix a build breakage for PREEMPT_DESKTOP.

Signed-off-by: Tsutomu OWA <[email protected]>
-- owa

diff -rup linux-2.6.21-rt1/include/asm-powerpc/spinlock.h rt/include/asm-powerpc/spinlock.h
--- linux-2.6.21-rt1/include/asm-powerpc/spinlock.h 2007-05-07 14:08:12.000000000 +0900
+++ rt/include/asm-powerpc/spinlock.h 2007-05-07 16:56:41.000000000 +0900
@@ -289,5 +289,9 @@ static __inline__ void __raw_write_unloc
#define _raw_read_relax(lock) __rw_yield(lock)
#define _raw_write_relax(lock) __rw_yield(lock)

+#define __raw_spin_relax(lock) cpu_relax()
+#define __raw_read_relax(lock) cpu_relax()
+#define __raw_write_relax(lock) cpu_relax()
+
#endif /* __KERNEL__ */
#endif /* __ASM_SPINLOCK_H */

2007-05-14 06:31:20

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 0/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


To reduce scheduling latecy by changing tlb flush size to 1.
Since tlb flush on Celleb is done by calling (an) expensive hypervisor call(s),
it takes a long time to flush tlbs and causes scheduing latency.

As I don't know how long it takes on other platforms, it would be better to
enclose it within #ifdef CONFIG_PPC_CELLEB.

Signed-off-by: Tsutomu OWA <[email protected]>
-- owa

diff -rup linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h rt/include/asm-powerpc/tlbflush.h
--- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h 2007-04-26 12:08:32.000000000 +0900
+++ rt/include/asm-powerpc/tlbflush.h 2007-05-07 14:23:50.000000000 +0900
@@ -25,7 +25,11 @@ struct mm_struct;
#include <linux/percpu.h>
#include <asm/page.h>

+#ifdef CONFIG_PREEMPT_RT
+#define PPC64_TLB_BATCH_NR 1 /* Since tlb flush takes long time, reduce it to 1 when RT */
+#else
#define PPC64_TLB_BATCH_NR 192
+#endif /* CONFIG_PREEMPT_RT */

struct ppc64_tlb_batch {
unsigned long index;

2007-05-14 06:39:05

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


Oopps, Subject was wrong.... resending it. Sorry.

To reduce scheduling latecy by changing tlb flush size to 1.
Since tlb flush on Celleb is done by calling (an) expensive hypervisor call(s),
it takes a long time to flush tlbs and causes scheduing latency.

As I don't know how long it takes on other platforms, it would be better to
enclose it within #ifdef CONFIG_PPC_CELLEB.

Signed-off-by: Tsutomu OWA <[email protected]>
-- owa

diff -rup linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h rt/include/asm-powerpc/tlbflush.h
--- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h 2007-04-26 12:08:32.000000000 +0900
+++ rt/include/asm-powerpc/tlbflush.h 2007-05-07 14:23:50.000000000 +0900
@@ -25,7 +25,11 @@ struct mm_struct;
#include <linux/percpu.h>
#include <asm/page.h>

+#ifdef CONFIG_PREEMPT_RT
+#define PPC64_TLB_BATCH_NR 1 /* Since tlb flush takes long time, reduce it to 1 when RT */
+#else
#define PPC64_TLB_BATCH_NR 192
+#endif /* CONFIG_PREEMPT_RT */

struct ppc64_tlb_batch {
unsigned long index;

2007-05-14 06:45:13

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 3/4] powerpc 2.6.21-rt1: add a need_resched_delayed() check


Add a need_resched_delayed() check.
This was pointed by Sergei Shtylyov;
http://ozlabs.org/pipermail/linuxppc-dev/2007-March/033148.html

Signed-off-by: Tsutomu Owa <[email protected]>
-- owa

diff -rup linux-2.6.21-rt1/arch/powerpc/kernel/idle.c rt/arch/powerpc/kernel/idle.c
--- linux-2.6.21-rt1/arch/powerpc/kernel/idle.c 2007-05-07 14:08:12.000000000 +0900
+++ rt/arch/powerpc/kernel/idle.c 2007-05-07 14:05:30.000000000 +0900
@@ -70,7 +70,9 @@ void cpu_idle(void)
local_irq_disable();

/* check again after disabling irqs */
- if (!need_resched() && !cpu_should_die())
+ if (!need_resched() &&
+ !need_resched_delayed() &&
+ !cpu_should_die())
ppc_md.power_save();

local_irq_enable();


2007-05-14 06:48:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size

Tsutomu-san,

On Mon, 2007-05-14 at 15:38 +0900, Tsutomu OWA wrote:
> Oopps, Subject was wrong.... resending it. Sorry.
>
> To reduce scheduling latecy by changing tlb flush size to 1.
> Since tlb flush on Celleb is done by calling (an) expensive hypervisor call(s),
> it takes a long time to flush tlbs and causes scheduing latency.
>
> As I don't know how long it takes on other platforms, it would be better to
> enclose it within #ifdef CONFIG_PPC_CELLEB.

Yes, that might be appropriate. Can you add this and resend please ?

tglx


2007-05-14 07:56:49

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


At Mon, 14 May 2007 08:51:46 +0200, Thomas Gleixner wrote:

> On Mon, 2007-05-14 at 15:38 +0900, Tsutomu OWA wrote:
> > As I don't know how long it takes on other platforms, it would be better to
> > enclose it within #ifdef CONFIG_PPC_CELLEB.
>
> Yes, that might be appropriate. Can you add this and resend please ?

Certainly, and thanks for your comment.

To reduce scheduling latecy by changing tlb flush size to 1.
Since tlb flush on Celleb is done by calling (an) expensive hypervisor call(s),
it takes a long time to flush tlbs and causes scheduing latency.

Signed-off-by: Tsutomu OWA <[email protected]>
-- owa

--- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h 2007-04-26 12:08:32.000000000 +0900
+++ rt/include/asm-powerpc/tlbflush.h 2007-05-14 16:12:47.000000000 +0900
@@ -25,7 +25,12 @@ struct mm_struct;
#include <linux/percpu.h>
#include <asm/page.h>

+#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
+/* Since tlb flush takes long time on Celleb, reduce it to 1 when Celleb && RT */
+#define PPC64_TLB_BATCH_NR 1
+#else
#define PPC64_TLB_BATCH_NR 192
+#endif /* defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT) */

struct ppc64_tlb_batch {
unsigned long index;

2007-05-14 14:40:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size

On Monday 14 May 2007, Tsutomu OWA wrote:
> --- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h?????2007-04-26 12:08:32.000000000 +0900
> +++ rt/include/asm-powerpc/tlbflush.h???2007-05-14 16:12:47.000000000 +0900
> @@ -25,7 +25,12 @@ struct mm_struct;
> ?#include <linux/percpu.h>
> ?#include <asm/page.h>
> ?
> +#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
> +/* Since tlb flush takes long time on Celleb, reduce it to 1 when Celleb && RT */
> +#define PPC64_TLB_BATCH_NR 1
> +#else
> ?#define PPC64_TLB_BATCH_NR 192
> +#endif /* defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT) */

With this code, you get silent side-effects of enabling PPC_CELLEB
along with another platform.

Maybe instead you should change the hpte_need_flush() to always flush
when running on the celleb platform and PREEMPT_RT is enabled.

Arnd <><

2007-05-15 04:13:27

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


At Mon, 14 May 2007 16:40:02 +0200, Arnd Bergmann wrote:
>
> On Monday 14 May 2007, Tsutomu OWA wrote:
> > --- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h?????2007-04-26 12:08:32.000000000 +0900
> > +++ rt/include/asm-powerpc/tlbflush.h???2007-05-14 16:12:47.000000000 +0900
> > @@ -25,7 +25,12 @@ struct mm_struct;
> > ?#include <linux/percpu.h>
> > ?#include <asm/page.h>
> > ?
> > +#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
> > +/* Since tlb flush takes long time on Celleb, reduce it to 1 when Celleb && RT */
> > +#define PPC64_TLB_BATCH_NR 1

> With this code, you get silent side-effects of enabling PPC_CELLEB
> along with another platform.

Yeah, thank you for pointing it out.
I'll send revised patch later.

> Maybe instead you should change the hpte_need_flush() to always flush
> when running on the celleb platform and PREEMPT_RT is enabled.

Hmm... Is it in linux-2.6.21? grep'ing it did not help...

Is http://patchwork.ozlabs.org/linuxppc/patch?id=10361 is the first place
where the hpte_need_flush() appears?

-- owa

2007-05-15 06:28:06

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


At Mon, 14 May 2007 16:40:02 +0200, Arnd Bergmann wrote:

> > +#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
> > +/* Since tlb flush takes long time on Celleb, reduce it to 1 when Celleb && RT */
> > +#define PPC64_TLB_BATCH_NR 1

> With this code, you get silent side-effects of enabling PPC_CELLEB
> along with another platform.

> Maybe instead you should change the hpte_need_flush() to always flush
> when running on the celleb platform and PREEMPT_RT is enabled.

OK, how about this one?

thanks a lot!

Since flushing tlb needs expensive hypervisor call(s) on celleb,
always flush it on RT to reduce scheduling latency.

Signed-off-by: Tsutomu OWA <[email protected]>
-- owa

--- linux-2.6.21-rt1/arch/powerpc/mm/tlb_64.c 2007-05-07 14:08:12.000000000 +0900
+++ rt/arch/powerpc/mm/tlb_64.c 2007-05-15 15:19:34.000000000 +0900
@@ -31,6 +31,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include <asm/bug.h>
+#include <asm/machdep.h>

DEFINE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch);

@@ -180,6 +181,18 @@ void hpte_update(struct mm_struct *mm, u
batch->vaddr[i] = (vsid << 28 ) | (addr & 0x0fffffff);
batch->pte[i] = __real_pte(__pte(pte), ptep);
batch->index = ++i;
+
+#ifdef CONFIG_PREEMPT_RT
+ /*
+ * Since flushing tlb needs expensive hypervisor call(s) on celleb,
+ * always flush it on RT to reduce scheduling latency.
+ */
+ if (machine_is(celleb)) {
+ flush_tlb_pending();
+ return;
+ }
+#endif /* CONFIG_PREEMPT_RT */
+
if (i >= PPC64_TLB_BATCH_NR)
flush_tlb_pending();
}

2007-05-15 07:24:12

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size

On Mon, 2007-05-14 at 16:40 +0200, Arnd Bergmann wrote:
> On Monday 14 May 2007, Tsutomu OWA wrote:
> > --- linux-2.6.21-rt1/include/asm-powerpc/tlbflush.h 2007-04-26 12:08:32.000000000 +0900
> > +++ rt/include/asm-powerpc/tlbflush.h 2007-05-14 16:12:47.000000000 +0900
> > @@ -25,7 +25,12 @@ struct mm_struct;
> > #include <linux/percpu.h>
> > #include <asm/page.h>
> >
> > +#if defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT)
> > +/* Since tlb flush takes long time on Celleb, reduce it to 1 when Celleb && RT */
> > +#define PPC64_TLB_BATCH_NR 1
> > +#else
> > #define PPC64_TLB_BATCH_NR 192
> > +#endif /* defined(CONFIG_PPC_CELLEB) && defined(CONFIG_PREEMPT_RT) */
>
> With this code, you get silent side-effects of enabling PPC_CELLEB
> along with another platform.
>
> Maybe instead you should change the hpte_need_flush() to always flush
> when running on the celleb platform and PREEMPT_RT is enabled.

I think it's wrong either way.... Maybe we can make it a variable and
measure how much we can reasonably do in a given time frame at
boot ? :-) Since it depends wether there's an hypervisor etc...

Ben.


2007-05-15 07:34:44

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size

On Tue, 2007-05-15 at 13:12 +0900, Tsutomu OWA wrote:

> > With this code, you get silent side-effects of enabling PPC_CELLEB
> > along with another platform.
>
> Yeah, thank you for pointing it out.
> I'll send revised patch later.
>
> > Maybe instead you should change the hpte_need_flush() to always flush
> > when running on the celleb platform and PREEMPT_RT is enabled.
>
> Hmm... Is it in linux-2.6.21? grep'ing it did not help...
>
> Is http://patchwork.ozlabs.org/linuxppc/patch?id=10361 is the first place
> where the hpte_need_flush() appears?

Yes, I added that when reworking the batching mecanism.

Ben.


2007-05-15 07:38:59

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


> +
> +#ifdef CONFIG_PREEMPT_RT
> + /*
> + * Since flushing tlb needs expensive hypervisor call(s) on celleb,
> + * always flush it on RT to reduce scheduling latency.
> + */
> + if (machine_is(celleb)) {
> + flush_tlb_pending();
> + return;
> + }
> +#endif /* CONFIG_PREEMPT_RT */
> +
> if (i >= PPC64_TLB_BATCH_NR)
> flush_tlb_pending();
> }

Any reason to do that only on celleb ? :-)

Also, we might want to still batch, though in smaller amount. Have you measured
the time it takes ? We might want to modulate the amount based on wether we
are using native hash tables or an hypervisor.

Ben.


2007-05-15 08:09:17

by Tsutomu OWA

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


At Tue, 15 May 2007 17:38:27 +1000, Benjamin Herrenschmidt wrote:

> > +#ifdef CONFIG_PREEMPT_RT
> > + /*
> > + * Since flushing tlb needs expensive hypervisor call(s) on celleb,
> > + * always flush it on RT to reduce scheduling latency.
> > + */
> > + if (machine_is(celleb)) {
> > + flush_tlb_pending();
> > + return;
> > + }
> > +#endif /* CONFIG_PREEMPT_RT */

> Any reason to do that only on celleb ? :-)

Well, at least it does not harm any other platforms :)

> Also, we might want to still batch, though in smaller amount.

Yeah, it's a good point.

> Have you measured
> the time it takes ? We might want to modulate the amount based on wether we
> are using native hash tables or an hypervisor.

Yes, here is the trace log. Accordint it, flushing 9 entries takes about 50us.
It means that flushing 192 (PPC64_TLB_BATCH_NR) entries takes 1ms.
Please note that tracing causes *roughly* 20-30% overhead, though.

I'm afraid I don't have numbers w/ native version, but I suppose you have :)

# cat /proc/latency_trace
preemption latency trace v1.1.5 on 2.6.21-rt1
--------------------------------------------------------------------
latency: 60 us, #53/53, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1 #P:1)
-----------------
| task: inetd-358 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
=> started at: .__switch_to+0xa8/0x178 <c00000000000feec>
=> ended at: .__start+0x4000000000000000/0x8 <00000000>

_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
inetd-358 0D..2 0us : .user_trace_start (.__switch_to)
inetd-358 0D..2 0us : .rt_up (.user_trace_start)
inetd-358 0D..3 1us : .rt_mutex_unlock (.rt_up)
inetd-358 0D..3 2us : .__flush_tlb_pending (.__switch_to)
inetd-358 0D..4 3us : .flush_hash_range (.__flush_tlb_pending)
inetd-358 0D..4 3us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 4us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 5us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 6us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 13us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 13us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 14us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 14us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 15us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 18us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 19us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 20us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 20us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 21us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 24us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 25us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 25us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 26us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 26us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 30us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 30us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 31us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 31us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 32us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 36us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 36us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 37us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 37us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 38us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 41us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 42us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 42us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 43us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 43us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 47us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 48us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 48us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 49us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 49us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 52us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..4 53us : .flush_hash_page (.flush_hash_range)
inetd-358 0D..4 54us : .beat_lpar_hpte_invalidate (.flush_hash_page)
inetd-358 0D..4 54us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 55us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
inetd-358 0D..5 58us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
inetd-358 0D..2 59us : .user_trace_stop (.__switch_to)
inetd-358 0D..2 60us : .user_trace_stop (.__switch_to)
-- owa

2007-05-15 08:41:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [patch 4/4] powerpc 2.6.21-rt1: reduce scheduling latency by changing tlb flush size


> > Have you measured
> > the time it takes ? We might want to modulate the amount based on wether we
> > are using native hash tables or an hypervisor.
>
> Yes, here is the trace log. Accordint it, flushing 9 entries takes about 50us.
> It means that flushing 192 (PPC64_TLB_BATCH_NR) entries takes 1ms.
> Please note that tracing causes *roughly* 20-30% overhead, though.
>
> I'm afraid I don't have numbers w/ native version, but I suppose you have :)

Actually I don't off hand but I'll try to get some.

> # cat /proc/latency_trace
> preemption latency trace v1.1.5 on 2.6.21-rt1
> --------------------------------------------------------------------
> latency: 60 us, #53/53, CPU#0 | (M:rt VP:0, KP:0, SP:1 HP:1 #P:1)
> -----------------
> | task: inetd-358 (uid:0 nice:0 policy:0 rt_prio:0)
> -----------------
> => started at: .__switch_to+0xa8/0x178 <c00000000000feec>
> => ended at: .__start+0x4000000000000000/0x8 <00000000>
>
> _------=> CPU#
> / _-----=> irqs-off
> | / _----=> need-resched
> || / _---=> hardirq/softirq
> ||| / _--=> preempt-depth
> |||| /
> ||||| delay
> cmd pid ||||| time | caller
> \ / ||||| \ | /
> inetd-358 0D..2 0us : .user_trace_start (.__switch_to)
> inetd-358 0D..2 0us : .rt_up (.user_trace_start)
> inetd-358 0D..3 1us : .rt_mutex_unlock (.rt_up)
> inetd-358 0D..3 2us : .__flush_tlb_pending (.__switch_to)
> inetd-358 0D..4 3us : .flush_hash_range (.__flush_tlb_pending)
> inetd-358 0D..4 3us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 4us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 5us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 6us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 13us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 13us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 14us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 14us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 15us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 18us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 19us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 20us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 20us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 21us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 24us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 25us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 25us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 26us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 26us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 30us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 30us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 31us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 31us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 32us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 36us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 36us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 37us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 37us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 38us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 41us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 42us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 42us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 43us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 43us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 47us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 48us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 48us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 49us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 49us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 52us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..4 53us : .flush_hash_page (.flush_hash_range)
> inetd-358 0D..4 54us : .beat_lpar_hpte_invalidate (.flush_hash_page)
> inetd-358 0D..4 54us : .__spin_lock_irqsave (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 55us+: .beat_lpar_hpte_getword0 (.beat_lpar_hpte_invalidate)
> inetd-358 0D..5 58us : .__spin_unlock_irqrestore (.beat_lpar_hpte_invalidate)
> inetd-358 0D..2 59us : .user_trace_stop (.__switch_to)
> inetd-358 0D..2 60us : .user_trace_stop (.__switch_to)
> -- owa