2012-08-07 12:34:30

by Srikar Dronamraju

[permalink] [raw]
Subject: rcu stalls seen with numasched_v2 patches applied.

Hi,

I saw this while I was running the 2nd August -tip kernel + Peter's
numasched patches.

Top showed load average to be 240, there was one cpu (cpu 7) which
showed 100% while all other cpus were idle. The system showed some
sluggishness. Before I saw this I ran Andrea's autonuma benchmark couple
of times.

I am not sure if this is an already reported issue/known issue.

INFO: rcu_sched self-detected stall on CPU { 7} (t=105182911 jiffies)
Pid: 5173, comm: qpidd Tainted: G W 3.5.0numasched_v2_020812+ #1
Call Trace:
<IRQ> [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
[<ffffffff81060918>] update_process_times+0x48/0x90
[<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
[<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
[<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
[<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
[<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
[<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
[<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff814e64b2>] ? _raw_spin_unlock_irqrestore+0x12/0x20
[<ffffffff81082552>] sched_setnode+0x82/0xf0
[<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
[<ffffffff81070c6c>] task_work_run+0x6c/0x80
[<ffffffff81013984>] do_notify_resume+0x94/0xa0
[<ffffffff814e6a6c>] retint_signal+0x48/0x8c
INFO: rcu_sched self-detected stall on CPU { 7} (t=105362914 jiffies)
Pid: 5173, comm: qpidd Tainted: G W 3.5.0numasched_v2_020812+ #1
Call Trace:
<IRQ> [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
[<ffffffff81060918>] update_process_times+0x48/0x90
[<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
[<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
[<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
[<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
[<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
[<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
[<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff81082562>] ? sched_setnode+0x92/0xf0
[<ffffffff81082552>] ? sched_setnode+0x82/0xf0
[<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
[<ffffffff81070c6c>] task_work_run+0x6c/0x80
[<ffffffff81013984>] do_notify_resume+0x94/0xa0
[<ffffffff814e6a6c>] retint_signal+0x48/0x8c
INFO: rcu_sched self-detected stall on CPU { 7} (t=105542917 jiffies)
Pid: 5173, comm: qpidd Tainted: G W 3.5.0numasched_v2_020812+ #1
Call Trace:
<IRQ> [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
[<ffffffff81060918>] update_process_times+0x48/0x90
[<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
[<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
[<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
[<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
[<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
[<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
[<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff814e64b2>] ? _raw_spin_unlock_irqrestore+0x12/0x20
[<ffffffff81082552>] sched_setnode+0x82/0xf0
[<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
[<ffffffff81070c6c>] task_work_run+0x6c/0x80
[<ffffffff81013984>] do_notify_resume+0x94/0xa0
[<ffffffff814e6a6c>] retint_signal+0x48/0x8c


<these messages keep repeating>

I saw this on a 2 node 24 cpu machine.

If I am able to reproduce this again, I plan to test these without the
numasched patches applied.

--
Thanks and Regards
Srikar Dronamraju


2012-08-07 13:53:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

On Tue, 2012-08-07 at 18:03 +0530, Srikar Dronamraju wrote:
> Hi,
>
> I saw this while I was running the 2nd August -tip kernel + Peter's
> numasched patches.
>
> Top showed load average to be 240, there was one cpu (cpu 7) which
> showed 100% while all other cpus were idle. The system showed some
> sluggishness. Before I saw this I ran Andrea's autonuma benchmark couple
> of times.
>
> I am not sure if this is an already reported issue/known issue.
>
> INFO: rcu_sched self-detected stall on CPU { 7} (t=105182911 jiffies)
> Pid: 5173, comm: qpidd Tainted: G W 3.5.0numasched_v2_020812+ #1
> Call Trace:
> <IRQ> [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
> [<ffffffff81060918>] update_process_times+0x48/0x90
> [<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
> [<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
> [<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
> [<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
> [<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
> [<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
> [<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
> <EOI> [<ffffffff814e64b2>] ? _raw_spin_unlock_irqrestore+0x12/0x20
> [<ffffffff81082552>] sched_setnode+0x82/0xf0
> [<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
> [<ffffffff81070c6c>] task_work_run+0x6c/0x80
> [<ffffffff81013984>] do_notify_resume+0x94/0xa0
> [<ffffffff814e6a6c>] retint_signal+0x48/0x8c

I haven't seen anything like that (obviously), but the one thing you can
try is undo the optimization Oleg suggested and use a separate
callback_head for the task_work and not reuse task_struct::rcu.

2012-08-07 17:10:05

by john stultz

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

On 08/07/2012 05:33 AM, Srikar Dronamraju wrote:
> Hi,
>
> I saw this while I was running the 2nd August -tip kernel + Peter's
> numasched patches.
>
> Top showed load average to be 240, there was one cpu (cpu 7) which
> showed 100% while all other cpus were idle. The system showed some
> sluggishness. Before I saw this I ran Andrea's autonuma benchmark couple
> of times.
>
> I am not sure if this is an already reported issue/known issue.
So Ingo pushed a fix the other day that might address this:
http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=commitdiff_plain;h=1d17d17484d40f2d5b35c79518597a2b25296996

But do let me know any reproduction details if you can trigger this
again. If you do trigger it again without that patch, watch to see if
the time value from date is running much faster then it should.

thanks
-john

2012-08-07 17:52:56

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

* John Stultz <[email protected]> [2012-08-07 10:08:51]:

> On 08/07/2012 05:33 AM, Srikar Dronamraju wrote:
> >Hi,
> >
> >I saw this while I was running the 2nd August -tip kernel + Peter's
> >numasched patches.
> >
> >Top showed load average to be 240, there was one cpu (cpu 7) which
> >showed 100% while all other cpus were idle. The system showed some
> >sluggishness. Before I saw this I ran Andrea's autonuma benchmark couple
> >of times.
> >
> >I am not sure if this is an already reported issue/known issue.
> So Ingo pushed a fix the other day that might address this:
> http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=commitdiff_plain;h=1d17d17484d40f2d5b35c79518597a2b25296996

Okay, will update after applying the patch.

>
> But do let me know any reproduction details if you can trigger this
> again. If you do trigger it again without that patch, watch to see
> if the time value from date is running much faster then it should.
>

The time value from date is normal

--
Thanks and Regards
Srikar

2012-08-07 18:05:58

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

* Peter Zijlstra <[email protected]> [2012-08-07 15:52:48]:

> On Tue, 2012-08-07 at 18:03 +0530, Srikar Dronamraju wrote:
> > Hi,
> >
> > INFO: rcu_sched self-detected stall on CPU { 7} (t=105182911 jiffies)
> > Pid: 5173, comm: qpidd Tainted: G W 3.5.0numasched_v2_020812+ #1
> > Call Trace:
> > <IRQ> [<ffffffff810d4c7e>] rcu_check_callbacks+0x18e/0x650
> > [<ffffffff81060918>] update_process_times+0x48/0x90
> > [<ffffffff810a2a7e>] tick_sched_timer+0x6e/0xe0
> > [<ffffffff810789a5>] __run_hrtimer+0x75/0x1a0
> > [<ffffffff810a2a10>] ? tick_setup_sched_timer+0x100/0x100
> > [<ffffffff810591cf>] ? __do_softirq+0x13f/0x240
> > [<ffffffff81078d56>] hrtimer_interrupt+0xf6/0x240
> > [<ffffffff814f0179>] smp_apic_timer_interrupt+0x69/0x99
> > [<ffffffff814ef14a>] apic_timer_interrupt+0x6a/0x70
> > <EOI> [<ffffffff814e64b2>] ? _raw_spin_unlock_irqrestore+0x12/0x20
> > [<ffffffff81082552>] sched_setnode+0x82/0xf0
> > [<ffffffff8108bd38>] task_numa_work+0x1e8/0x240
> > [<ffffffff81070c6c>] task_work_run+0x6c/0x80
> > [<ffffffff81013984>] do_notify_resume+0x94/0xa0
> > [<ffffffff814e6a6c>] retint_signal+0x48/0x8c
>
> I haven't seen anything like that (obviously), but the one thing you can
> try is undo the optimization Oleg suggested and use a separate
> callback_head for the task_work and not reuse task_struct::rcu.
>

Are you referring to this the commit 158e1645e (trim task_work: get rid of hlist)

I am also able to reproduce this on another 8 node machine too.

Just to update, I had to revert commit: b9403130a5 sched/cleanups: Add
load balance cpumask pointer to 'struct lb_env' so that your patches
apply cleanly. (I dont think this should have caused any problem.. but)

--
Thanks and Regards
Srikar

2012-08-08 19:59:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

On Tue, 2012-08-07 at 22:49 +0530, Srikar Dronamraju wrote:
> Are you referring to this the commit 158e1645e (trim task_work: get rid of hlist)

No, to something like the below..

> I am also able to reproduce this on another 8 node machine too.

Ship me one ;-)

> Just to update, I had to revert commit: b9403130a5 sched/cleanups: Add
> load balance cpumask pointer to 'struct lb_env' so that your patches
> apply cleanly. (I dont think this should have caused any problem.. but)

Yeah, I've got a rebase on top of that.. just wanted fold this
page::last_nid thing into the page::flags before posting again.

---
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1539,6 +1539,7 @@ struct task_struct {
#ifdef CONFIG_SMP
u64 node_stamp; /* migration stamp */
unsigned long numa_contrib;
+ struct callback_head numa_work;
#endif /* CONFIG_SMP */
#endif /* CONFIG_NUMA */
struct rcu_head rcu;
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -816,7 +816,7 @@ void task_numa_work(struct callback_head
struct task_struct *t, *p = current;
int node = p->node_last;

- WARN_ON_ONCE(p != container_of(work, struct task_struct, rcu));
+ WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));

/*
* Who cares about NUMA placement when they're dying.
@@ -891,8 +891,8 @@ void task_tick_numa(struct rq *rq, struc
* yet and exit_task_work() is called before
* exit_notify().
*/
- init_task_work(&curr->rcu, task_numa_work);
- task_work_add(curr, &curr->rcu, true);
+ init_task_work(&curr->numa_work, task_numa_work);
+ task_work_add(curr, &curr->numa_work, true);
}
curr->node_last = node;
}

2012-08-10 16:24:58

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

> ---
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1539,6 +1539,7 @@ struct task_struct {
> #ifdef CONFIG_SMP
> u64 node_stamp; /* migration stamp */
> unsigned long numa_contrib;
> + struct callback_head numa_work;
> #endif /* CONFIG_SMP */
> #endif /* CONFIG_NUMA */
> struct rcu_head rcu;
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -816,7 +816,7 @@ void task_numa_work(struct callback_head
> struct task_struct *t, *p = current;
> int node = p->node_last;
>
> - WARN_ON_ONCE(p != container_of(work, struct task_struct, rcu));
> + WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));
>
> /*
> * Who cares about NUMA placement when they're dying.
> @@ -891,8 +891,8 @@ void task_tick_numa(struct rq *rq, struc
> * yet and exit_task_work() is called before
> * exit_notify().
> */
> - init_task_work(&curr->rcu, task_numa_work);
> - task_work_add(curr, &curr->rcu, true);
> + init_task_work(&curr->numa_work, task_numa_work);
> + task_work_add(curr, &curr->numa_work, true);
> }
> curr->node_last = node;
> }
>

This change worked well on the 2 node machine
but on the 8 node machine it hangs with repeated messages

Pid: 60935, comm: numa01 Tainted: G W 3.5.0-numasched_v2_020812+ #4
Call Trace:
<IRQ> [<ffffffff810d32e2>] ? rcu_check_callback s+0x632/0x650
[<ffffffff81061bb8>] ? update_process_times+0x48/0x90
[<ffffffff810a2a4e>] ? tick_sched_timer+0x6e/0xe0
[<ffffffff81079c85>] ? __run_hrtimer+0x75/0x1a0
[<ffffffff810a29e0>] ? tick_setup_sched_timer+0x100/0x100
[<ffffffff8107a036>] ? hrtimer_interrupt+0xf6/0x250
[<ffffffff814f1379>] ? smp_apic_timer_interrupt+0x69/0x99
[<ffffffff814f034a>] ? apic_timer_interrupt+0x6a/0x70
<EOI> [<ffffffff811082e3>] ? wait_on_page_bit+0x73/0x80
[<ffffffff814e7992>] ? _raw_spin_lock+0x22/0x30
[<ffffffff81131bf3>] ? handle_pte_fault+0x1b3/0xca0
[<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
[<ffffffff8107a9a8>] ? up_read+0x18/0x30
[<ffffffff814eb2be>] ? do_page_fault+0x13e/0x460
[<ffffffff810137ba>] ? __switch_to+0x1aa/0x460
[<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
[<ffffffff814e7de5>] ? page_fault+0x25/0x30
{ 3} (t=62998 jiffies)

2012-08-13 07:51:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

On Fri, 2012-08-10 at 21:54 +0530, Srikar Dronamraju wrote:

> This change worked well on the 2 node machine
> but on the 8 node machine it hangs with repeated messages
>
> Pid: 60935, comm: numa01 Tainted: G W 3.5.0-numasched_v2_020812+ #4
> Call Trace:
> <IRQ> [<ffffffff810d32e2>] ? rcu_check_callback s+0x632/0x650
> [<ffffffff81061bb8>] ? update_process_times+0x48/0x90
> [<ffffffff810a2a4e>] ? tick_sched_timer+0x6e/0xe0
> [<ffffffff81079c85>] ? __run_hrtimer+0x75/0x1a0
> [<ffffffff810a29e0>] ? tick_setup_sched_timer+0x100/0x100
> [<ffffffff8107a036>] ? hrtimer_interrupt+0xf6/0x250
> [<ffffffff814f1379>] ? smp_apic_timer_interrupt+0x69/0x99
> [<ffffffff814f034a>] ? apic_timer_interrupt+0x6a/0x70
> <EOI> [<ffffffff811082e3>] ? wait_on_page_bit+0x73/0x80
> [<ffffffff814e7992>] ? _raw_spin_lock+0x22/0x30
> [<ffffffff81131bf3>] ? handle_pte_fault+0x1b3/0xca0
> [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> [<ffffffff8107a9a8>] ? up_read+0x18/0x30
> [<ffffffff814eb2be>] ? do_page_fault+0x13e/0x460
> [<ffffffff810137ba>] ? __switch_to+0x1aa/0x460
> [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> [<ffffffff814e7de5>] ? page_fault+0x25/0x30
> { 3} (t=62998 jiffies)
>

If you run a -tip kernel without the numa patches, does that work?

2012-08-13 08:11:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

On Mon, 2012-08-13 at 09:51 +0200, Peter Zijlstra wrote:
> On Fri, 2012-08-10 at 21:54 +0530, Srikar Dronamraju wrote:
>
> > This change worked well on the 2 node machine
> > but on the 8 node machine it hangs with repeated messages
> >
> > Pid: 60935, comm: numa01 Tainted: G W 3.5.0-numasched_v2_020812+ #4
> > Call Trace:
> > <IRQ> [<ffffffff810d32e2>] ? rcu_check_callback s+0x632/0x650
> > [<ffffffff81061bb8>] ? update_process_times+0x48/0x90
> > [<ffffffff810a2a4e>] ? tick_sched_timer+0x6e/0xe0
> > [<ffffffff81079c85>] ? __run_hrtimer+0x75/0x1a0
> > [<ffffffff810a29e0>] ? tick_setup_sched_timer+0x100/0x100
> > [<ffffffff8107a036>] ? hrtimer_interrupt+0xf6/0x250
> > [<ffffffff814f1379>] ? smp_apic_timer_interrupt+0x69/0x99
> > [<ffffffff814f034a>] ? apic_timer_interrupt+0x6a/0x70
> > <EOI> [<ffffffff811082e3>] ? wait_on_page_bit+0x73/0x80
> > [<ffffffff814e7992>] ? _raw_spin_lock+0x22/0x30
> > [<ffffffff81131bf3>] ? handle_pte_fault+0x1b3/0xca0
> > [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> > [<ffffffff8107a9a8>] ? up_read+0x18/0x30
> > [<ffffffff814eb2be>] ? do_page_fault+0x13e/0x460
> > [<ffffffff810137ba>] ? __switch_to+0x1aa/0x460
> > [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> > [<ffffffff814e7de5>] ? page_fault+0x25/0x30
> > { 3} (t=62998 jiffies)
> >
>
> If you run a -tip kernel without the numa patches, does that work?


n/m, I found a total brain-fart in there.. does the below sort it?

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -917,7 +917,7 @@ void task_numa_work(struct callback_head
t = p;
do {
sched_setnode(t, node);
- } while ((t = next_thread(p)) != p);
+ } while ((t = next_thread(t)) != p);
rcu_read_unlock();
}

2012-08-16 17:17:17

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

* Peter Zijlstra <[email protected]> [2012-08-13 10:11:28]:

> On Mon, 2012-08-13 at 09:51 +0200, Peter Zijlstra wrote:
> > On Fri, 2012-08-10 at 21:54 +0530, Srikar Dronamraju wrote:
> >
> > > This change worked well on the 2 node machine
> > > but on the 8 node machine it hangs with repeated messages
> > >
> > > Pid: 60935, comm: numa01 Tainted: G W 3.5.0-numasched_v2_020812+ #4
> > > Call Trace:
> > > <IRQ> [<ffffffff810d32e2>] ? rcu_check_callback s+0x632/0x650
> > > [<ffffffff81061bb8>] ? update_process_times+0x48/0x90
> > > [<ffffffff810a2a4e>] ? tick_sched_timer+0x6e/0xe0
> > > [<ffffffff81079c85>] ? __run_hrtimer+0x75/0x1a0
> > > [<ffffffff810a29e0>] ? tick_setup_sched_timer+0x100/0x100
> > > [<ffffffff8107a036>] ? hrtimer_interrupt+0xf6/0x250
> > > [<ffffffff814f1379>] ? smp_apic_timer_interrupt+0x69/0x99
> > > [<ffffffff814f034a>] ? apic_timer_interrupt+0x6a/0x70
> > > <EOI> [<ffffffff811082e3>] ? wait_on_page_bit+0x73/0x80
> > > [<ffffffff814e7992>] ? _raw_spin_lock+0x22/0x30
> > > [<ffffffff81131bf3>] ? handle_pte_fault+0x1b3/0xca0
> > > [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> > > [<ffffffff8107a9a8>] ? up_read+0x18/0x30
> > > [<ffffffff814eb2be>] ? do_page_fault+0x13e/0x460
> > > [<ffffffff810137ba>] ? __switch_to+0x1aa/0x460
> > > [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> > > [<ffffffff814e7de5>] ? page_fault+0x25/0x30
> > > { 3} (t=62998 jiffies)
> > >
> >
> > If you run a -tip kernel without the numa patches, does that work?
>
>
> n/m, I found a total brain-fart in there.. does the below sort it?
>
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -917,7 +917,7 @@ void task_numa_work(struct callback_head
> t = p;
> do {
> sched_setnode(t, node);
> - } while ((t = next_thread(p)) != p);
> + } while ((t = next_thread(t)) != p);
> rcu_read_unlock();
> }
>
>

I tried this fix, but doesnt seem to help.

will try on -tip and revert.

--
thanks and regards
srikar

2012-08-17 05:24:59

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: rcu stalls seen with numasched_v2 patches applied.

* Peter Zijlstra <[email protected]> [2012-08-13 09:51:13]:

> On Fri, 2012-08-10 at 21:54 +0530, Srikar Dronamraju wrote:
>
> > This change worked well on the 2 node machine
> > but on the 8 node machine it hangs with repeated messages
> >
> > Pid: 60935, comm: numa01 Tainted: G W 3.5.0-numasched_v2_020812+ #4
> > Call Trace:
> > <IRQ> [<ffffffff810d32e2>] ? rcu_check_callback s+0x632/0x650
> > [<ffffffff81061bb8>] ? update_process_times+0x48/0x90
> > [<ffffffff810a2a4e>] ? tick_sched_timer+0x6e/0xe0
> > [<ffffffff81079c85>] ? __run_hrtimer+0x75/0x1a0
> > [<ffffffff810a29e0>] ? tick_setup_sched_timer+0x100/0x100
> > [<ffffffff8107a036>] ? hrtimer_interrupt+0xf6/0x250
> > [<ffffffff814f1379>] ? smp_apic_timer_interrupt+0x69/0x99
> > [<ffffffff814f034a>] ? apic_timer_interrupt+0x6a/0x70
> > <EOI> [<ffffffff811082e3>] ? wait_on_page_bit+0x73/0x80
> > [<ffffffff814e7992>] ? _raw_spin_lock+0x22/0x30
> > [<ffffffff81131bf3>] ? handle_pte_fault+0x1b3/0xca0
> > [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> > [<ffffffff8107a9a8>] ? up_read+0x18/0x30
> > [<ffffffff814eb2be>] ? do_page_fault+0x13e/0x460
> > [<ffffffff810137ba>] ? __switch_to+0x1aa/0x460
> > [<ffffffff814e64f7>] ? __schedule+0x2e7/0x710
> > [<ffffffff814e7de5>] ? page_fault+0x25/0x30
> > { 3} (t=62998 jiffies)
> >
>
> If you run a -tip kernel without the numa patches, does that work?
>

Running on -tip kernel seems okay. Will see if I can bisect the patch
that causes this issue and let you know.

--
Thanks and Regards
Srikar