[PATCH, RT, RFC] Hacks allowing -rt to run on POWER7 / Powerpc.
We've been seeing some issues with userspace randomly SIGSEGV'ing while
running the -RT kernels on POWER7 based systems. After lots of
debugging, head scratching, and experimental changes to the code, the
problem has been narrowed down such that we can avoid the problems by
disabling the TLB batching.
After some input from Ben and further debug, we've found that the
restoration of the batch->active value near the end of __switch_to()
seems to be the key. ( The -RT related changes within
arch/powerpc/kernel/processor.c __switch_to() do the equivalent of a
arch_leave_lazy_mmu_mode() before calling _switch, use a hadbatch flag
to indicate if batching was active, and then restore that batch->active
value on the way out after the call to _switch_to. That particular
code is in the -RT branch, and not found in mainline )
Deferring to Ben (or others in the know) for whether this is the proper
solution or if there is something deeper, but..
IF the right answer is to simply disable the restoration of
batch->active, the rest of the CONFIG_PREEMPT_RT changes in
__switch_to() should then be replaceable with a single call to
arch_leave_lazy_mmu_mode().
The patch here is what I am currently running with, on both POWER6 and
POWER7 systems, successfully.
Signed-off-by: Will Schmidt <[email protected]>
CC: Ben Herrenschmidt <[email protected]>
CC: Thomas Gleixner <[email protected]>
---
diff -aurp linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c
--- linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c 2010-06-21 11:41:34.402513904 -0500
+++ linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c 2010-07-09 13:15:13.533269904 -0500
@@ -304,10 +304,6 @@ struct task_struct *__switch_to(struct t
struct thread_struct *new_thread, *old_thread;
unsigned long flags;
struct task_struct *last;
-#if defined(CONFIG_PPC64) && defined (CONFIG_PREEMPT_RT)
- struct ppc64_tlb_batch *batch;
- int hadbatch;
-#endif
#ifdef CONFIG_SMP
/* avoid complexity of lazy save/restore of fpu
@@ -401,16 +397,6 @@ struct task_struct *__switch_to(struct t
new_thread->start_tb = current_tb;
}
-#ifdef CONFIG_PREEMPT_RT
- batch = &__get_cpu_var(ppc64_tlb_batch);
- if (batch->active) {
- hadbatch = 1;
- if (batch->index) {
- __flush_tlb_pending(batch);
- }
- batch->active = 0;
- }
-#endif /* #ifdef CONFIG_PREEMPT_RT */
#endif
local_irq_save(flags);
@@ -425,16 +411,13 @@ struct task_struct *__switch_to(struct t
* of sync. Hard disable here.
*/
hard_irq_disable();
- last = _switch(old_thread, new_thread);
-
- local_irq_restore(flags);
#if defined(CONFIG_PPC64) && defined(CONFIG_PREEMPT_RT)
- if (hadbatch) {
- batch = &__get_cpu_var(ppc64_tlb_batch);
- batch->active = 1;
- }
+ arch_leave_lazy_mmu_mode();
#endif
+ last = _switch(old_thread, new_thread);
+
+ local_irq_restore(flags);
return last;
}
On Fri, 09 Jul 2010 about 08:55:01 -0000, Will Schmidt wrote:
> We've been seeing some issues with userspace randomly SIGSEGV'ing while
> running the -RT kernels on POWER7 based systems. After lots of
> debugging, head scratching, and experimental changes to the code, the
> problem has been narrowed down such that we can avoid the problems by
> disabling the TLB batching.
>
> After some input from Ben and further debug, we've found that the
> restoration of the batch->active value near the end of __switch_to()
> seems to be the key. ( The -RT related changes within
> arch/powerpc/kernel/processor.c __switch_to() do the equivalent of a
> arch_leave_lazy_mmu_mode() before calling _switch, use a hadbatch flag
> to indicate if batching was active, and then restore that batch->active
> value on the way out after the call to _switch_to. That particular
> code is in the -RT branch, and not found in mainline )
>
> Deferring to Ben (or others in the know) for whether this is the proper
> solution or if there is something deeper, but..
I looked at the patch and noticed 2 changes:
1) the batch is checked and cleared after local_irq_save
2) enabling the batch is skipped
I talked to Will and had him try moving the local_irq_save above the
check for the active batch. That alone did not seem to be enough.
However, he confirmed that we are setting batch to active when it is
already active in lazy_mmu_enter, meaning that batching is being turned
on recursively. I suggested debug to check that irqs are off after the
restore when re-enabling when our debug session timed out.
milton
>
> diff -aurp linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c
> --- linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c 2010-06-21 11:41:34.402513904 -0500
> +++ linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c 2010-07-09 13:15:13.533269904 -0500
> @@ -304,10 +304,6 @@ struct task_struct *__switch_to(struct t
> struct thread_struct *new_thread, *old_thread;
> unsigned long flags;
> struct task_struct *last;
> -#if defined(CONFIG_PPC64) && defined (CONFIG_PREEMPT_RT)
> - struct ppc64_tlb_batch *batch;
> - int hadbatch;
> -#endif
>
> #ifdef CONFIG_SMP
> /* avoid complexity of lazy save/restore of fpu
> @@ -401,16 +397,6 @@ struct task_struct *__switch_to(struct t
> new_thread->start_tb = current_tb;
> }
>
> -#ifdef CONFIG_PREEMPT_RT
> - batch = &__get_cpu_var(ppc64_tlb_batch);
> - if (batch->active) {
> - hadbatch = 1;
> - if (batch->index) {
> - __flush_tlb_pending(batch);
> - }
> - batch->active = 0;
> - }
> -#endif /* #ifdef CONFIG_PREEMPT_RT */
> #endif
>
> local_irq_save(flags);
> @@ -425,16 +411,13 @@ struct task_struct *__switch_to(struct t
> * of sync. Hard disable here.
> */
> hard_irq_disable();
> - last = _switch(old_thread, new_thread);
> -
> - local_irq_restore(flags);
>
> #if defined(CONFIG_PPC64) && defined(CONFIG_PREEMPT_RT)
> - if (hadbatch) {
> - batch = &__get_cpu_var(ppc64_tlb_batch);
> - batch->active = 1;
> - }
> + arch_leave_lazy_mmu_mode();
> #endif
> + last = _switch(old_thread, new_thread);
> +
> + local_irq_restore(flags);
>
> return last;
> }
On Sun, 2010-07-11 at 02:49 -0500, Milton Miller wrote:
> On Fri, 09 Jul 2010 about 08:55:01 -0000, Will Schmidt wrote:
> > We've been seeing some issues with userspace randomly SIGSEGV'ing while
> > running the -RT kernels on POWER7 based systems. After lots of
> > debugging, head scratching, and experimental changes to the code, the
> > problem has been narrowed down such that we can avoid the problems by
> > disabling the TLB batching.
> >
> > After some input from Ben and further debug, we've found that the
> > restoration of the batch->active value near the end of __switch_to()
> > seems to be the key. ( The -RT related changes within
> > arch/powerpc/kernel/processor.c __switch_to() do the equivalent of a
> > arch_leave_lazy_mmu_mode() before calling _switch, use a hadbatch flag
> > to indicate if batching was active, and then restore that batch->active
> > value on the way out after the call to _switch_to. That particular
> > code is in the -RT branch, and not found in mainline )
> >
> > Deferring to Ben (or others in the know) for whether this is the proper
> > solution or if there is something deeper, but..
I believe this is still on Ben's list of things to look at. Between
then and now, I'll see if I can get Thomas to pick this up for the -RT
tree to keep RT functional on P7 in the mean-time.
A bit more debug info below.
>
>
> I looked at the patch and noticed 2 changes:
> 1) the batch is checked and cleared after local_irq_save
> 2) enabling the batch is skipped
>
> I talked to Will and had him try moving the local_irq_save above the
> check for the active batch. That alone did not seem to be enough.
> However, he confirmed that we are setting batch to active when it is
> already active in lazy_mmu_enter, meaning that batching is being turned
> on recursively. I suggested debug to check that irqs are off after the
> restore when re-enabling when our debug session timed out.
Based on some of the debug suggestions from Milton:
A WARN_ON for (!irqs_disabled) after local_irq_restore() did not show
any hits. (while otherwise continuing to suffer from the tlb batching
troubles).
---><----
hard_irq_disable();
last = _switch(old_thread, new_thread);
local_irq_restore(flags);
WARN_ON(!irqs_disabled()); <<<<----------
#if defined(CONFIG_PPC64) && defined(CONFIG_PREEMPT_RT) && 1
if (hadbatch) {
batch = &__get_cpu_var(ppc64_tlb_batch);
batch->active = 1;
}
#endif
----><----
Another assortment of WARN_ONs in the arch_{enter,leave}_lazy_mmu_mode
functions. As Milton stated above, the check for batch->active on the
way into the arch_enter_* function did generate lots of hits, the other
warn_ons did not.
-----><-------
static inline void arch_enter_lazy_mmu_mode(void)
{
struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
//|-----WARN_ON(batch->active); /* lots of hits if enabled */
|-------WARN_ON(irqs_disabled()); /* nothing.... */
|-------batch->active = 1;
....
static inline void arch_leave_lazy_mmu_mode(void)
{
|-------struct ppc64_tlb_batch *batch = &get_cpu_var(ppc64_tlb_batch);
|-------WARN_ON(!batch->active); /* nothing.....*/
|-------WARN_ON(irqs_disabled()); /* nothing.... */
....
>
> milton
>
> >
> > diff -aurp linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c
> > --- linux-2.6.33.5-rt23.orig/arch/powerpc/kernel/process.c 2010-06-21 11:41:34.402513904 -0500
> > +++ linux-2.6.33.5-rt23.exp/arch/powerpc/kernel/process.c 2010-07-09 13:15:13.533269904 -0500
> > @@ -304,10 +304,6 @@ struct task_struct *__switch_to(struct t
> > struct thread_struct *new_thread, *old_thread;
> > unsigned long flags;
> > struct task_struct *last;
> > -#if defined(CONFIG_PPC64) && defined (CONFIG_PREEMPT_RT)
> > - struct ppc64_tlb_batch *batch;
> > - int hadbatch;
> > -#endif
> >
> > #ifdef CONFIG_SMP
> > /* avoid complexity of lazy save/restore of fpu
> > @@ -401,16 +397,6 @@ struct task_struct *__switch_to(struct t
> > new_thread->start_tb = current_tb;
> > }
> >
> > -#ifdef CONFIG_PREEMPT_RT
> > - batch = &__get_cpu_var(ppc64_tlb_batch);
> > - if (batch->active) {
> > - hadbatch = 1;
> > - if (batch->index) {
> > - __flush_tlb_pending(batch);
> > - }
> > - batch->active = 0;
> > - }
> > -#endif /* #ifdef CONFIG_PREEMPT_RT */
> > #endif
> >
> > local_irq_save(flags);
> > @@ -425,16 +411,13 @@ struct task_struct *__switch_to(struct t
> > * of sync. Hard disable here.
> > */
> > hard_irq_disable();
> > - last = _switch(old_thread, new_thread);
> > -
> > - local_irq_restore(flags);
> >
> > #if defined(CONFIG_PPC64) && defined(CONFIG_PREEMPT_RT)
> > - if (hadbatch) {
> > - batch = &__get_cpu_var(ppc64_tlb_batch);
> > - batch->active = 1;
> > - }
> > + arch_leave_lazy_mmu_mode();
> > #endif
> > + last = _switch(old_thread, new_thread);
> > +
> > + local_irq_restore(flags);
> >
> > return last;
> > }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html